<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>The Last DBA on Last DBA</title><link>https://lastdba.com/en/</link><description>Recent content in The Last DBA on Last DBA</description><generator>Hugo -- gohugo.io</generator><language>en-US</language><copyright>© 2026 liuzhilong62</copyright><lastBuildDate>Fri, 29 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://lastdba.com/en/index.xml" rel="self" type="application/rss+xml"/><item><title>A DBA's Perspective on the 0526 Approved Database List</title><link>https://lastdba.com/en/2026/05/29/a-dbas-perspective-on-the-0526-approved-database-list/</link><pubDate>Fri, 29 May 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/05/29/a-dbas-perspective-on-the-0526-approved-database-list/</guid><description>&lt;blockquote&gt;&lt;p&gt;AI rate 5%&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;TL;DR
 &lt;div id="tldr" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tldr" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;On May 26, the Xinchuang Database List 2026 No. 2 was released, with 23 products passing (8 centralized + 15 distributed) — the most ever. Most notably: Ping An, UnionPay, China Mobile, and China Telecom — four major buyers — had their self-incubated databases debut on the list. The Xinchuang logic has changed — buyers are no longer just buyers.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Latest List
 &lt;div id="the-latest-list" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-latest-list" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Historical batch statistics for the Xinchuang database list. Data source: China Information Security Evaluation Center (itsec.gov.cn), 8 batches total, 4 containing databases.&lt;/p&gt;</description><content:encoded>&lt;blockquote&gt;&lt;p&gt;AI rate 5%&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;TL;DR
 &lt;div id="tldr" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tldr" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;On May 26, the Xinchuang Database List 2026 No. 2 was released, with 23 products passing (8 centralized + 15 distributed) — the most ever. Most notably: Ping An, UnionPay, China Mobile, and China Telecom — four major buyers — had their self-incubated databases debut on the list. The Xinchuang logic has changed — buyers are no longer just buyers.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Latest List
 &lt;div id="the-latest-list" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-latest-list" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Historical batch statistics for the Xinchuang database list. Data source: China Information Security Evaluation Center (itsec.gov.cn), 8 batches total, 4 containing databases.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;By Batch&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Batch&lt;/th&gt;
 &lt;th&gt;Date&lt;/th&gt;
 &lt;th&gt;Database Products&lt;/th&gt;
 &lt;th&gt;Achieved Level II&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;2023#1&lt;/td&gt;
 &lt;td&gt;2023-12-26&lt;/td&gt;
 &lt;td&gt;11 (centralized)&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2024#2&lt;/td&gt;
 &lt;td&gt;2024-09-30&lt;/td&gt;
 &lt;td&gt;17 (6 centralized + 11 distributed)&lt;/td&gt;
 &lt;td&gt;GaussDB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2025#2&lt;/td&gt;
 &lt;td&gt;2025-08-22&lt;/td&gt;
 &lt;td&gt;3 (centralized)&lt;/td&gt;
 &lt;td&gt;None&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2026#2&lt;/td&gt;
 &lt;td&gt;2026-05-26&lt;/td&gt;
 &lt;td&gt;23 (8 centralized + 15 distributed)&lt;/td&gt;
 &lt;td&gt;Dameng/Yashan/GaussDB/GoldenDB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;By Appearances (≥2 times)&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Vendor&lt;/th&gt;
 &lt;th&gt;Count&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Dameng&lt;/td&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GBASE&lt;/td&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Alibaba Cloud&lt;/td&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;HighGo&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Tencent Cloud&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;East Golden&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Vastdata&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Huawei Cloud&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ZTE (GoldenDB)&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;OceanBase&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Kingbase&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Shentong&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Xugu&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Yashan&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Only 1 time&lt;/td&gt;
 &lt;td&gt;PingCAP/Wanli/Uxin/Ping An/China Mobile/UnionPay/Telecom Cloud/Timecho/Transwarp/DolphinDB/Z-Range/CM Suzhou&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;By Category: Big Tech / Unicorn / Major Buyer&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Category&lt;/th&gt;
 &lt;th&gt;Vendors&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Big Tech&lt;/td&gt;
 &lt;td&gt;Huawei Cloud (GaussDB/TaurusDB/DWS), Alibaba Cloud (PolarDB/AnalyticDB), Tencent Cloud (TDSQL), ZTE (GoldenDB), OceanBase (Ant Group)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Unicorns&lt;/td&gt;
 &lt;td&gt;PingCAP (TiDB), Yashan (SICS), Transwarp (ArgoDB), Timecho (TimechoDB), DolphinDB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Major Buyers&lt;/td&gt;
 &lt;td&gt;Ping An Tech (RASESQL), China UnionPay (UPDRDB), China Mobile (Panwei + He3DB), China Telecom Cloud (TeleDB)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Traditional Xinchuang&lt;/td&gt;
 &lt;td&gt;Dameng, Kingbase, GBASE, Shentong, HighGo, Xugu, Vastdata, East Golden, Wanli, Uxin&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 class="relative group"&gt;The Floodgates Open
 &lt;div id="the-floodgates-open" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-floodgates-open" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When this list came out, my reaction was four words: &lt;strong&gt;the floodgates opened&lt;/strong&gt;. 23 products — the most ever. A few highlights:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ping An RASESQL.&lt;/strong&gt; The most unexpected. Ping An Group&amp;rsquo;s fintech capabilities have always been strong, but there was almost no public information about them building a database. Seeing &amp;ldquo;RASESQL&amp;rdquo; on the list stunned me for several seconds. A financial buyer of Ping An&amp;rsquo;s scale — once their self-developed database passes national testing, their internal Xinchuang replacement roadmap gains one more path.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;UnionPay UPDRDB.&lt;/strong&gt; Equally mysterious. I had no idea UnionPay was building a distributed database before this. UnionPay&amp;rsquo;s transaction volume speaks for itself — a distributed database that can handle their own business won&amp;rsquo;t be technically weak.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Alibaba Cloud PolarDB for MySQL.&lt;/strong&gt; The MySQL-compatible edition of PolarDB not passing had been something many people remembered. Now, all three of PolarDB&amp;rsquo;s main lines — PG edition, distributed edition, MySQL edition — have passed. Add AnalyticDB, and Alibaba Cloud&amp;rsquo;s database family is basically complete.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;China Mobile Panwei + China Telecom TeleDB.&lt;/strong&gt; China Mobile already had He3DB (CM Suzhou) pass national testing last year; this year Panwei is their second product. China Telecom TeleDB debuts. Both telecom operators now have their own incubated Xinchuang databases, which should significantly reduce their respective Xinchuang replacement pressure. Interestingly, China Unicom has been silent — their Xinchuang strategy is clearly different from Mobile and Telecom.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transwarp ArgoDB.&lt;/strong&gt; Transwarp started in the big data/Hadoop ecosystem and now their distributed database has passed national testing. Once crowned &amp;ldquo;China&amp;rsquo;s First Domestic Big Data Infrastructure Software Stock&amp;rdquo; with a market cap exceeding 30 billion, their path from data lake to Xinchuang database has been validated.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Impact
 &lt;div id="impact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#impact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The most important signal from this floodgate opening: &lt;strong&gt;buyers can self-develop databases&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;What are the implications?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Major buyers who succeed at self-development don&amp;rsquo;t have to be lambs to the slaughter.&lt;/li&gt;
&lt;li&gt;Those major buyers who haven&amp;rsquo;t built one yet may restart their self-development efforts.&lt;/li&gt;
&lt;li&gt;The market share that big tech and unicorns could compete for in the domestic database market just shrank.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Financial industry players UnionPay and Ping An, telecom players China Mobile and China Telecom — all passed national testing, effectively earning a &amp;ldquo;R&amp;amp;D Success&amp;rdquo; gold badge. Internally, each organization must be celebrating. For external vendors, what they&amp;rsquo;ve lost isn&amp;rsquo;t just major clients — more precisely, &lt;strong&gt;they&amp;rsquo;ve lost absolute bargaining power&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;I know you&amp;rsquo;re in a tough spot, and I know you can&amp;rsquo;t afford not to buy, so I&amp;rsquo;ll swap the butcher&amp;rsquo;s knife for a dragon-slaying blade and slaughter you to death&amp;rdquo; — for buyers who successfully incubated their own databases, this kind of predicament has been substantially eased. That&amp;rsquo;s significant.&lt;/p&gt;
&lt;p&gt;As for where Xinchuang policy goes next, nobody can say. Based on previous lists, things should be getting stricter (last time only 3 databases passed), but this time they unexpectedly opened the floodgates. A sharp contraction next round isn&amp;rsquo;t impossible. Not just China Unicom — insurance industry players like CPIC and PICC, and even capable financial institutions, could consider jumping in to hand-roll their own database.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Bittersweet Reflections
 &lt;div id="bittersweet-reflections" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bittersweet-reflections" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since our kernel team sits right behind me, I have some understanding of the Xinchuang R&amp;amp;D process. After consecutive failed submissions, the entire team&amp;rsquo;s morale was extremely low. I believe we weren&amp;rsquo;t the only ones — many teams whose submissions failed felt the same. For industries like finance and telecom, there&amp;rsquo;s a Xinchuang mandate, but if your self-developed product doesn&amp;rsquo;t pass approval, there&amp;rsquo;s no choice at the corporate strategy level, and at the team level, there&amp;rsquo;s no reason for existence. That&amp;rsquo;s why &amp;ldquo;passing national testing&amp;rdquo; carries such weight and influence. Thankfully they passed — heartfelt congratulations to them! RaseSQL No.1!&lt;/p&gt;
&lt;p&gt;At the same time, it&amp;rsquo;s clear that Xinchuang results and direction are unstable, volatile, and impactful. It determines some companies&amp;rsquo; strategies and many people&amp;rsquo;s fates. I myself am even a piece on this wheel of fortune.&lt;/p&gt;
&lt;p&gt;Beyond those on the list, many organizations poured enormous effort but remain off the list. Their products might be terrible, or they might be excellent. But national testing is that stark watershed — a mysterious ticket of admission. &lt;strong&gt;Pass or fail — in the domestic market, those are two entirely different concepts.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;OK, just some thoughts — might delete later.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Reference
 &lt;div id="reference" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reference" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.itsec.gov.cn/aqkkcp/cpgg/" target="_blank" rel="noreferrer"&gt;https://www.itsec.gov.cn/aqkkcp/cpgg/&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2026/05/29/xinchuang-db-2026-review/" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2026/05/29/xinchuang-db-2026-review/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>UUID v4 and v7: Collision Incidents and Performance Benchmarks</title><link>https://lastdba.com/en/2026/05/29/uuid-v4-and-v7-collision-incidents-and-performance-benchmarks/</link><pubDate>Fri, 29 May 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/05/29/uuid-v4-and-v7-collision-incidents-and-performance-benchmarks/</guid><description>&lt;blockquote&gt;&lt;p&gt;Source material: &lt;a href="https://news.ycombinator.com/item?id=48060054" target="_blank" rel="noreferrer"&gt;HN UUID v4 Collision Thread&lt;/a&gt;, &lt;a href="https://dev.to/umangsinha12/postgresql-uuid-performance-benchmarking-random-v4-and-time-based-v7-uuids-n9b" target="_blank" rel="noreferrer"&gt;dev.to UUID Benchmark&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;AI-generated ratio: 99%&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;TL;DR
 &lt;div id="tldr" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tldr" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v4 collided — someone on HackerNews actually hit a real collision. The root cause was a software stack bug, not math. v4 and v7 have no fundamental difference in collision safety. The real difference is index performance: v7 is time-ordered, B-tree is more compact, writes are 35% faster, indexes are 22% smaller. Your UUID v4 is probably fine, but if you care about index performance, switching to v7 is a cheap win.&lt;/p&gt;</description><content:encoded>&lt;blockquote&gt;&lt;p&gt;Source material: &lt;a href="https://news.ycombinator.com/item?id=48060054" target="_blank" rel="noreferrer"&gt;HN UUID v4 Collision Thread&lt;/a&gt;, &lt;a href="https://dev.to/umangsinha12/postgresql-uuid-performance-benchmarking-random-v4-and-time-based-v7-uuids-n9b" target="_blank" rel="noreferrer"&gt;dev.to UUID Benchmark&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;AI-generated ratio: 99%&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;TL;DR
 &lt;div id="tldr" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tldr" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v4 collided — someone on HackerNews actually hit a real collision. The root cause was a software stack bug, not math. v4 and v7 have no fundamental difference in collision safety. The real difference is index performance: v7 is time-ordered, B-tree is more compact, writes are 35% faster, indexes are 22% smaller. Your UUID v4 is probably fine, but if you care about index performance, switching to v7 is a cheap win.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The UUID v4 Collision Incident
 &lt;div id="the-uuid-v4-collision-incident" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-uuid-v4-collision-incident" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A HackerNews thread blew up — &lt;a href="https://news.ycombinator.com/item?id=48060054" target="_blank" rel="noreferrer"&gt;Ask HN: We just had an actual UUID v4 collision&amp;hellip;&lt;/a&gt;, 479 upvotes, 347 comments.&lt;/p&gt;
&lt;p&gt;The OP&amp;rsquo;s own words:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;I know what you&amp;rsquo;re thinking&amp;hellip; and I still can&amp;rsquo;t believe it, but&amp;hellip; This morning, our database flagged a duplicate UUID (v4).&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;It wasn&amp;rsquo;t a double-insert bug. The code didn&amp;rsquo;t write it twice. Only ~15,000 rows in the table, using npm&amp;rsquo;s &lt;code&gt;uuid&lt;/code&gt; package &lt;code&gt;uuidv4()&lt;/code&gt;, and two rows created at different times collided on the same UUID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What&amp;rsquo;s the probability of a UUID v4 collision? 122 random bits, 2^122 ≈ 5.3×10^36 possibilities. With 15,000 records, collision probability is roughly 2×10^-29. Theoretically &amp;ldquo;impossible.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;But it happened.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Cause 1: Unreliable entropy sources
 &lt;div id="cause-1-unreliable-entropy-sources" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-1-unreliable-entropy-sources" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;HN&amp;rsquo;s top-voted comment (jandrewrogers):&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;UUIDv4 security depends on high-quality entropy sources. Hardware defects, software bugs, and misunderstandings of &amp;ldquo;high-quality entropy&amp;rdquo; all break this assumption. Detecting entropy source failures is expensive, so nobody checks — until a collision happens.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;UUID v4 is &lt;strong&gt;explicitly banned&lt;/strong&gt; in high-reliability systems because entropy source quality cannot be verified.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Cause 2: Known npm uuid package bugs
 &lt;div id="cause-2-known-npm-uuid-package-bugs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-2-known-npm-uuid-package-bugs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The npm uuid package README itself warns:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This module may generate duplicate UUIDs when run in clients with deterministic random number generators, such as Googlebot crawlers.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;More seriously, its internal &lt;code&gt;rng()&lt;/code&gt; function has global mutable state. One commenter pointed out: calling &lt;code&gt;rng()&lt;/code&gt; and sending the result effectively &lt;strong&gt;overwrites someone else&amp;rsquo;s random number, and you can predict it&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Related commit: &lt;a href="https://github.com/uuidjs/uuid/commit/91805f665c38b691ac2cbd" target="_blank" rel="noreferrer"&gt;91805f665c&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Community advice: use Node.js built-in &lt;code&gt;crypto.randomUUID()&lt;/code&gt;, not the npm uuid package.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Cause 3: Linux kernel /dev/random race condition
 &lt;div id="cause-3-linux-kernel-devrandom-race-condition" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-3-linux-kernel-devrandom-race-condition" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Another comment:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;I encountered duplicate UUIDs during soak testing of a distributed system. After extensive debugging, I found it was a Linux kernel race condition bug — on multi-processor systems, two processes simultaneously reading /dev/random could, with extremely low probability (~one in a million), get the same bytes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 class="relative group"&gt;Cause 4: Go UUID library not checking return values
 &lt;div id="cause-4-go-uuid-library-not-checking-return-values" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-4-go-uuid-library-not-checking-return-values" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;blockquote&gt;&lt;p&gt;Early Go UUID libraries called random functions without checking the return value length. &amp;ldquo;Request N bytes, got 3 bytes back&amp;rdquo; never happened on most hardware, so nobody checked — until production, where it generated thousands of duplicate UUIDs.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h4 class="relative group"&gt;Cause 5: Historical AMD CPU RNG defects
 &lt;div id="cause-5-historical-amd-cpu-rng-defects" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cause-5-historical-amd-cpu-rng-defects" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Certain AMD CPUs had built-in random number generator issues. VM environments can also &amp;ldquo;virtualize away&amp;rdquo; entropy — both time sources and entropy sources can degrade inside VMs.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;v4 and v7 have no fundamental difference in collision safety. The difference is in the first 48 bits — v4 is random, v7 is a timestamp. You&amp;rsquo;re unlikely to encounter timestamp source issues, and random source issues are equally rare. The HN thread is an interesting edge case. Knowing that a tiny number of people hit it is enough — you don&amp;rsquo;t need to distrust the UUID v4 in your own systems.&lt;/p&gt;
&lt;p&gt;When choosing v4 vs v7, what you should really look at isn&amp;rsquo;t collisions — it&amp;rsquo;s &lt;strong&gt;index performance&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;UUID v7 Performance Comparison in PG 16
 &lt;div id="uuid-v7-performance-comparison-in-pg-16" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#uuid-v7-performance-comparison-in-pg-16" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v7 has one concrete advantage over v4 in PostgreSQL: &lt;strong&gt;temporal clustering, more B-tree-friendly&lt;/strong&gt;. v4 can bloat and v7 can bloat too — the difference is simply that v7&amp;rsquo;s first 48 bits are time-ordered, so inserts concentrate on the right side of the B-tree, reducing page splits.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dev.to/umangsinha12/postgresql-uuid-performance-benchmarking-random-v4-and-time-based-v7-uuids-n9b" target="_blank" rel="noreferrer"&gt;Umang Sinha&amp;rsquo;s benchmark&lt;/a&gt; ran a rigorous comparison on a PG 16 Docker container (8 cores, 16GB, NVMe).&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test Conditions
 &lt;div id="test-conditions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-conditions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; uuid_v4_test (id UUID &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, payload TEXT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; uuid_v7_test (id UUID &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, payload TEXT);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Value&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Data volume&lt;/td&gt;
 &lt;td&gt;10 million rows per table&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Batch size&lt;/td&gt;
 &lt;td&gt;10,000 rows per batch&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Client&lt;/td&gt;
 &lt;td&gt;Go + pq driver&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UUID generation&lt;/td&gt;
 &lt;td&gt;Pre-generated in memory, not timed&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;Performance Results
 &lt;div id="performance-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#performance-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Metric&lt;/th&gt;
 &lt;th&gt;UUID v4&lt;/th&gt;
 &lt;th&gt;UUID v7&lt;/th&gt;
 &lt;th&gt;Improvement&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Write 10M rows&lt;/td&gt;
 &lt;td&gt;5 min 35 sec&lt;/td&gt;
 &lt;td&gt;3 min 38 sec&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;35% faster&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Table + index total size&lt;/td&gt;
 &lt;td&gt;3618 MB&lt;/td&gt;
 &lt;td&gt;3443 MB&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;5% smaller&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;B-tree index size&lt;/td&gt;
 &lt;td&gt;776 MB&lt;/td&gt;
 &lt;td&gt;602 MB&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;22% smaller&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Point lookup&lt;/td&gt;
 &lt;td&gt;0.167 ms&lt;/td&gt;
 &lt;td&gt;0.038 ms&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;4.4x faster&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Range scan&lt;/td&gt;
 &lt;td&gt;8.283 ms&lt;/td&gt;
 &lt;td&gt;3.791 ms&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;2.2x faster&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;Why Such a Big Difference
 &lt;div id="why-such-a-big-difference" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-such-a-big-difference" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/uuid-v4-structure.png" alt="UUID v4 bit structure" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/uuid-v7-structure.png" alt="UUID v7 bit structure" /&gt;&lt;/p&gt;
&lt;p&gt;UUID v4 is fully random. Newly inserted UUIDs scatter randomly across the B-tree index, causing massive page splits and severe index fragmentation. UUID v7 has a millisecond-precision timestamp in the first 48 bits, so newly generated UUIDs are naturally ordered — writes cluster on the right side of the B-tree, page splits drop dramatically, and the index is much more compact.&lt;/p&gt;
&lt;p&gt;The 22% smaller index isn&amp;rsquo;t magic — it&amp;rsquo;s &lt;strong&gt;reduced fragmentation&lt;/strong&gt;. Point lookups being 4x faster isn&amp;rsquo;t surprising either — fewer B-tree levels, higher cache hit rates.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;UUID v4 and v7 are identical in collision safety — both depend on entropy source quality, one fills the first 48 bits with random numbers, the other with a timestamp. Collisions are edge cases that a tiny number of people hit in specific environments. Your environment is probably fine — that basic judgment doesn&amp;rsquo;t change.&lt;/p&gt;
&lt;p&gt;What you really should think about is &lt;strong&gt;index performance&lt;/strong&gt;. v7&amp;rsquo;s temporal property makes B-trees more compact, with measured results of 35% faster writes, 22% smaller indexes, and 2-4x faster queries. If your system writes UUIDs at high volume, switching to v7 saves meaningful storage and CPU.&lt;/p&gt;
&lt;p&gt;PG 18 will natively support &lt;code&gt;gen_uuid_v7()&lt;/code&gt;. For now, generate UUIDs at the application layer. Whichever version you use, always add a UNIQUE constraint.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>When PostgreSQL Becomes AI's Hands — Bruce Momjian's MCP Server in Practice</title><link>https://lastdba.com/en/2026/05/27/when-postgresql-becomes-ais-hands-bruce-momjians-mcp-server-in-practice/</link><pubDate>Wed, 27 May 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/05/27/when-postgresql-becomes-ais-hands-bruce-momjians-mcp-server-in-practice/</guid><description>&lt;blockquote&gt;&lt;p&gt;Original: &lt;a href="https://momjian.us/main/writings/pgsql/mcp.pdf" target="_blank" rel="noreferrer"&gt;Building an MCP Server Using Postgres&lt;/a&gt;, Bruce Momjian, PGDay Armenia 2026, CC BY 4.0.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;AI-generated ratio: 80%&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Bruce Momjian (PG core team, the one who has written release notes for 20+ years) recently gave a talk at PGDay Armenia 2026: &lt;a href="https://momjian.us/main/writings/pgsql/mcp.pdf" target="_blank" rel="noreferrer"&gt;Building an MCP Server Using Postgres&lt;/a&gt;. 70 slides, extremely dense. Theory and practice — a solid reference.&lt;/p&gt;
&lt;p&gt;Reading it directly is hard work. Even having AI interpret it probably won&amp;rsquo;t make sense at first glance. I had to read for a while and ask several questions before it clicked.&lt;/p&gt;</description><content:encoded>&lt;blockquote&gt;&lt;p&gt;Original: &lt;a href="https://momjian.us/main/writings/pgsql/mcp.pdf" target="_blank" rel="noreferrer"&gt;Building an MCP Server Using Postgres&lt;/a&gt;, Bruce Momjian, PGDay Armenia 2026, CC BY 4.0.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;AI-generated ratio: 80%&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Bruce Momjian (PG core team, the one who has written release notes for 20+ years) recently gave a talk at PGDay Armenia 2026: &lt;a href="https://momjian.us/main/writings/pgsql/mcp.pdf" target="_blank" rel="noreferrer"&gt;Building an MCP Server Using Postgres&lt;/a&gt;. 70 slides, extremely dense. Theory and practice — a solid reference.&lt;/p&gt;
&lt;p&gt;Reading it directly is hard work. Even having AI interpret it probably won&amp;rsquo;t make sense at first glance. I had to read for a while and ask several questions before it clicked.&lt;/p&gt;
&lt;p&gt;These 70 slides can be cleanly split into two layers — the first half is theory, the second half is a hands-on demo. The two layers don&amp;rsquo;t have much to do with each other.&lt;/p&gt;
&lt;hr&gt;

&lt;h1 class="relative group"&gt;Theory Layer: Explaining the RAG → MCP Evolution Through Transformers (Slides 1-33)
 &lt;div id="theory-layer-explaining-the-rag--mcp-evolution-through-transformers-slides-1-33" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#theory-layer-explaining-the-rag--mcp-evolution-through-transformers-slides-1-33" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;The theory layer takes up nearly half the content, from LLM fundamentals to how MCP works. The outline is clear:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/mcp/outline.png" alt="Talk outline: Generative AI → LLM limitations → RAG → MCP → MCP Server in practice" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;RAG vs MCP: In One Sentence
 &lt;div id="rag-vs-mcp-in-one-sentence" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rag-vs-mcp-in-one-sentence" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Everyone knows the RAG workflow: the programmer decides what data to query → retrieval results are appended to the system prompt → the LLM reads and generates a response. &lt;strong&gt;Pre-orchestrated&lt;/strong&gt; — what the LLM can see is decided before the user even asks.&lt;/p&gt;
&lt;p&gt;MCP is different. Tool descriptions are registered with the LLM, and the LLM &lt;strong&gt;decides for itself&lt;/strong&gt; during generation whether to call a tool and which one. &lt;strong&gt;Dynamic decision-making&lt;/strong&gt; — the programmer only exposes tools, the LLM handles orchestration.&lt;/p&gt;
&lt;p&gt;Bruce sums it up in one sentence:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;RAG can only do what the programmer pre-planned. MCP can dynamically adjust based on output quality, can iteratively call multiple tools, and can trigger external tasks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;&amp;ldquo;Word or MCP&amp;rdquo; — That Set of Vector Embedding Diagrams
 &lt;div id="word-or-mcp--that-set-of-vector-embedding-diagrams" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#word-or-mcp--that-set-of-vector-embedding-diagrams" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Slides 18-33 are the core of the theory layer. Bruce draws a detailed internal Transformer flow diagram:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/mcp/mcp-servers.png" alt="MCP Server registered as Tool Embedding Vectors in the vector space" /&gt;&lt;/p&gt;
&lt;p&gt;His logic: take each MCP tool&amp;rsquo;s description text (e.g., &amp;ldquo;Return the radiation level (CPM) at 13 Roberts Road&amp;hellip;&amp;rdquo;), embed it into a vector using a text embedding model, and inject it into the attention layer&amp;rsquo;s vector space. Then at each inference step, the output vector matches against the nearest vector —&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/mcp/word-or-mcp.png" alt="The closest vector might be a text token, or an MCP tool" /&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&amp;ldquo;The closest vector might be a word or an MCP.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Is This Model Correct?
 &lt;div id="is-this-model-correct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#is-this-model-correct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This is what puzzled me the most. Here are my thoughts.&lt;/p&gt;
&lt;p&gt;Bruce&amp;rsquo;s 15 slides are beautifully drawn, but if you try to understand them as engineering implementation, there are problems:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;① MCP tools don&amp;rsquo;t need &amp;ldquo;embedding.&amp;rdquo;&lt;/strong&gt; In actual engineering, tool definitions are written directly into the system prompt as text. The LLM reads &amp;ldquo;You have these tools: geiger(), get_pretzel_inventory()…&amp;rdquo; and uses semantic understanding to decide when to call them. There&amp;rsquo;s no need to compute tool descriptions as vectors, no need to do cosine distance comparisons against word vectors. The essence of Bruce&amp;rsquo;s teaching model is explaining &amp;ldquo;LLM decision-making&amp;rdquo; as &amp;ldquo;nearest vector matching&amp;rdquo; — this is closer to the retrieval paradigm than the generation paradigm.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;② Attention doesn&amp;rsquo;t produce a &amp;ldquo;find nearest&amp;rdquo; operation.&lt;/strong&gt; &lt;code&gt;output = Σ(softmax(Q·K) × V)&lt;/code&gt; yields a weighted-mixed context vector. There&amp;rsquo;s no step of &amp;ldquo;binary choice between the word embedding table and the tool embedding table.&amp;rdquo; The actual mechanism for LLM tool selection is: attention produces hidden states → LM head → softmax over vocabulary → output tool call JSON. There&amp;rsquo;s never a &amp;ldquo;word vs tool&amp;rdquo; choice, only a softmax over the entire vocabulary.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;③ System prompt and user prompt have no boundary in attention.&lt;/strong&gt; A token sequence is just a token sequence — attention blocks do Q·K dot products on all tokens equally. There is no &amp;ldquo;system zone&amp;rdquo; or &amp;ldquo;user zone.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;So these 33 theory slides can be seen as a simplified teaching model Bruce built for DBAs without an AI background — visually appealing and easy to understand, but don&amp;rsquo;t use it as an architecture diagram. MCP&amp;rsquo;s truly revolutionary aspect is &lt;strong&gt;protocol standardization&lt;/strong&gt; (unified tool registration/discovery/calling spec), not any vectorization trick.&lt;/p&gt;
&lt;hr&gt;

&lt;h1 class="relative group"&gt;Practice Layer: Two Working Demos (Slides 34-69)
 &lt;div id="practice-layer-two-working-demos-slides-34-69" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#practice-layer-two-working-demos-slides-34-69" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;Starting from Slide 34, the style abruptly shifts — all code, terminal output, hardware photos. That entire Transformer vector model from the theory layer completely disappears, replaced by &lt;code&gt;curl&lt;/code&gt;, &lt;code&gt;psql&lt;/code&gt;, and Perl scripts.&lt;/p&gt;
&lt;p&gt;The only thread connecting the two layers is that &amp;ldquo;they&amp;rsquo;re both talking about MCP.&amp;rdquo; But the vector matching mechanism painted in the theory layer and the actual implementation in the practice layer are nearly two different logic systems. This may be exactly the tension Bruce intended — the theory layer helps you understand why MCP is stronger than RAG, and the practice layer tells you how to actually implement it today.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Demo 1: Letting ChatGPT Read a Real-World Geiger Counter
 &lt;div id="demo-1-letting-chatgpt-read-a-real-world-geiger-counter" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#demo-1-letting-chatgpt-read-a-real-world-geiger-counter" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Bruce set up a GQ GMC-800 Geiger counter (radiation detector) in his backyard, connected via USB to a Raspberry Pi, taking environmental radiation readings every 15 minutes. First, see ChatGPT using MCP to call real data:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/mcp/chatgpt-weather.png" alt="ChatGPT querying weather via MCP" /&gt;&lt;/p&gt;
&lt;p&gt;MCP can call external tools to get real-time data — something RAG cannot do.&lt;/p&gt;
&lt;p&gt;Connected to hardware:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/mcp/geiger-counter.png" alt="GQ GMC-800 Geiger counter" /&gt;&lt;/p&gt;
&lt;p&gt;Wrote a Python wrapper using &lt;strong&gt;fastmcp&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;from&lt;/span&gt; fastmcp &lt;span style="color:#f92672"&gt;import&lt;/span&gt; FastMCP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mcp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; FastMCP(&lt;span style="color:#e6db74"&gt;&amp;#34;Geiger counter MCP server&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;@mcp.tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;geiger&lt;/span&gt;() &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; int:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&amp;#34;Return the radiation level (CPM) at 13 Roberts Road, Newtown Square, PA, USA&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; subprocess&lt;span style="color:#f92672"&gt;.&lt;/span&gt;check_output(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;/var/lib/postgresql/tmp/geiger&amp;#34;&lt;/span&gt;, shell&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;, text&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The underlying layer is a Perl script that sends &lt;code&gt;&amp;lt;GETCPM&amp;gt;&amp;gt;&lt;/code&gt; over serial, reads back a 4-byte CPM value. Apache reverse-proxies port 443 (OpenAI only talks to 443). After registering with ChatGPT:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: What&amp;#39;s the radiation level at 13 Roberts Road?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GPT: I don&amp;#39;t have public data for that location...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: Use my custom app
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GPT: [calls geiger tool] → 14 CPM. Normal background radiation (5-25 CPM).
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: Take five readings and give me the average
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GPT: [calls ×5] 15 16 13 15 15 → average 14.8 CPM&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Two key behaviors:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The LLM can iteratively call tools and compute&lt;/strong&gt; — RAG is a one-shot data dump, MCP is &amp;ldquo;call → get result → decide → call again → compute&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The user must explicitly authorize&lt;/strong&gt; — the first time, ChatGPT didn&amp;rsquo;t say &amp;ldquo;I have your Geiger counter data.&amp;rdquo; Only when the user said &amp;ldquo;use my custom app&amp;rdquo; did the tool call trigger. The security model is conservative&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Demo 2: Using PG as a Pretzel Shop Inventory System
 &lt;div id="demo-2-using-pg-as-a-pretzel-shop-inventory-system" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#demo-2-using-pg-as-a-pretzel-shop-inventory-system" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;From hardware back to software. Building a pretzel inventory database:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; pretzel (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; quantity INTEGER &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (quantity &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; pretzel &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- initial inventory 0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MCP tools use &lt;code&gt;psql&lt;/code&gt; to operate on PG directly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-python" data-lang="python"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;@mcp.tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;get_pretzel_inventory&lt;/span&gt;() &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; int:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&amp;#34;Return the number of unsold pretzels&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; subprocess&lt;span style="color:#f92672"&gt;.&lt;/span&gt;check_output(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;psql --tuples-only -c &amp;#39;SELECT quantity FROM pretzel;&amp;#39; -d mcp&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shell&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;, text&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;@mcp.tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;sold_one_pretzel&lt;/span&gt;() &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; str:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&amp;#34;Call this when a pretzel is sold; reduces inventory by one&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; subprocess&lt;span style="color:#f92672"&gt;.&lt;/span&gt;check_output(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;psql --tuples-only -c &amp;#39;UPDATE pretzel SET quantity = quantity - 1;&amp;#39; -d mcp&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shell&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;, text&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;@mcp.tool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;def&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;baked_6_pretzels&lt;/span&gt;() &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; str:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&amp;#34;Call this when a tray of 6 pretzels is baked; increases inventory&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; subprocess&lt;span style="color:#f92672"&gt;.&lt;/span&gt;check_output(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;psql --tuples-only -c &amp;#39;UPDATE pretzel SET quantity = quantity + 6;&amp;#39; -d mcp&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shell&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;, text&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; )&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Interaction flow:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: How many pretzels available?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GPT: 0 pretzels.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: I just baked a tray → 6 pretzels
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: I sold two → 4 remaining
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: I sold four → 0 remaining
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User: I sold one pretzel → ERROR! CHECK constraint prevented negative quantity&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The LLM doesn&amp;rsquo;t write SQL directly — it calls your predefined, controlled interfaces. PG&amp;rsquo;s CHECK constraints naturally form a safety net — even if the LLM is tricked into calling the wrong function, the database-level constraint provides a second line of defense.&lt;/p&gt;
&lt;p&gt;But this also exposes a problem: the LLM faithfully executed &lt;code&gt;sold_one_pretzel&lt;/code&gt;, but didn&amp;rsquo;t anticipate that &amp;ldquo;inventory is 0, calling it will error.&amp;rdquo; &lt;strong&gt;MCP is the execution layer, not the reasoning layer.&lt;/strong&gt;&lt;/p&gt;
&lt;hr&gt;

&lt;h1 class="relative group"&gt;How Far from Production
 &lt;div id="how-far-from-production" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-far-from-production" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;On the final slide, Bruce frankly admits the current implementation&amp;rsquo;s limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No authentication&lt;/strong&gt; — anyone can call your MCP Server&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No parameterization&lt;/strong&gt; — all three tools are parameterless functions; real-world tools need to accept parameters&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No security restrictions on dynamic SQL&lt;/strong&gt; — tool descriptions declare semantics, but the LLM could be injected with malicious content&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Connection pooling, transaction management, rate limiting&lt;/strong&gt; — none addressed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Two recommended practical reads:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.pgedge.com/blog/lessons-learned-writing-an-mcp-server-for-postgresql" target="_blank" rel="noreferrer"&gt;pgedge.com: Lessons Learned Writing an MCP Server for PostgreSQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cardinalops.com/blog/mcp-defaults-hidden-dangers-of-remote-deployment/" target="_blank" rel="noreferrer"&gt;CardinalOps: MCP Defaults — Hidden Dangers of Remote Deployment&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;

&lt;h1 class="relative group"&gt;Between the Two Layers
 &lt;div id="between-the-two-layers" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#between-the-two-layers" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;Looking back at these 70 slides, the most interesting part isn&amp;rsquo;t any single demo — it&amp;rsquo;s how the theoretical thinking and hands-on work together explain what MCP can do:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The theory layer uses Transformer vector spaces to explain &amp;ldquo;how the LLM chooses between words and tools&amp;rdquo; — this is a teaching model&lt;/li&gt;
&lt;li&gt;The practice layer uses &lt;code&gt;psql&lt;/code&gt;, &lt;code&gt;curl&lt;/code&gt;, and Perl scripts to actually implement things — this is engineering&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The real MCP mechanism — tool definitions inserted as text into the system prompt, the LLM using semantic understanding to decide which tool to call, outputting tool call JSON — needs none of the vector embedding model from the theory layer. Between the two layers, Bruce didn&amp;rsquo;t draw the connecting line. This might not be a bug — it might be a feature.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>My Blog is Live</title><link>https://lastdba.com/en/2026/05/16/my-blog-is-live/</link><pubDate>Sat, 16 May 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/05/16/my-blog-is-live/</guid><description>&lt;h3 class="relative group"&gt;It&amp;rsquo;s Live!
 &lt;div id="its-live" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#its-live" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The blog is finally live.&lt;/p&gt;
&lt;p&gt;URL: &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;https://lastdba.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Accessible from China, mobile-friendly too.&lt;/p&gt;
&lt;p&gt;76 articles — all PostgreSQL writing from the past few years: case studies, internals, source code analysis, paper deep reads.&lt;/p&gt;
&lt;p&gt;This is a proper launch: new framework, new domain, new theme — rebuilt from the ground up.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Highlights
 &lt;div id="highlights" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#highlights" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Clean Interface&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Minimalist, reader-friendly design with a useful search feature.&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;It&amp;rsquo;s Live!
 &lt;div id="its-live" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#its-live" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The blog is finally live.&lt;/p&gt;
&lt;p&gt;URL: &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;https://lastdba.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Accessible from China, mobile-friendly too.&lt;/p&gt;
&lt;p&gt;76 articles — all PostgreSQL writing from the past few years: case studies, internals, source code analysis, paper deep reads.&lt;/p&gt;
&lt;p&gt;This is a proper launch: new framework, new domain, new theme — rebuilt from the ground up.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Highlights
 &lt;div id="highlights" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#highlights" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Clean Interface&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Minimalist, reader-friendly design with a useful search feature.&lt;/p&gt;
&lt;p&gt;

 


&lt;img src="https://lastdba.com/img/image-20260517000626083.png" alt="image-20260517000626083" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Framework: Jekyll → Hugo&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Version 1: Jekyll + minima theme + 2000 lines of CSS&lt;/p&gt;
&lt;p&gt;Version 2: Hugo + Blowfish theme + 0 lines of CSS&lt;/p&gt;
&lt;p&gt;V1 was decent, but building the UI myself was exhausting. I remembered vonng had written an article about website architecture choices, so I just went and borrowed from it. I explained the architecture to AI and had it learn from vonng.com — the page quality jumped up a level instantly. A few more tweaks and it was done.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Domain: github.io → lastdba.com&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Bought &lt;code&gt;lastdba.com&lt;/code&gt;, configured Cloudflare. GitHub Pages with custom domain, free HTTPS certificate, auto-renewal. Now accessible without VPN!&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Image Localization&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Previously, article images were scattered everywhere — CSDN CDN, GitHub PicBed, Modb OSS. CSDN has hotlink protection. GitHub PicBed on foreign networks often failed to load domestically. This time I had AI consolidate everything to local paths. No more worrying about image hosts going down. Cross-network image loading problems solved — very good.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Reflections on Going Live
 &lt;div id="reflections-on-going-live" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reflections-on-going-live" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;I&amp;rsquo;d actually set up a blog URL before — just fork a blog project and deploy via GitHub Pages. The domain was &lt;code&gt;liuzhilong62.github.io/blogs&lt;/code&gt;. But being somewhat of a quality freak (not really), the results were mediocre so I took it down. Later I just used the GitHub repo as my blog, without even enabling Pages. Recently, with more free time for various reasons, I revisited this and used Hermes to build the blog from scratch.&lt;/p&gt;
&lt;p&gt;As a DBA and backend engineer, I know nothing about frontend stuff like Jekyll, Hugo, Blowfish, CSS. I just give Hermes a target and it does the work. When it explains things to me I don&amp;rsquo;t understand (and I&amp;rsquo;m too embarrassed to admit it), I basically just say &amp;ldquo;keep going.&amp;rdquo; I check the result in the browser — if I&amp;rsquo;m satisfied, great; occasionally I say &amp;ldquo;revert this.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Honestly, my biggest takeaway from switching to Hugo wasn&amp;rsquo;t technical — it was &amp;ldquo;don&amp;rsquo;t reinvent the wheel.&amp;rdquo; I&amp;rsquo;d spent so much time hand-coding dark mode, TOC, search, only to discover a theme swap includes it all, and theirs looks better than mine.&lt;/p&gt;
&lt;p&gt;Also, after hooking up &lt;code&gt;lastdba.com&lt;/code&gt;, the blog suddenly felt &amp;ldquo;official.&amp;rdquo; &lt;code&gt;liuzhilong62.github.io/blogs&lt;/code&gt; felt like a personal experiment; now it feels like a real website. Same content, different feeling.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What It Cost
 &lt;div id="what-it-cost" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-it-cost" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;All expenses:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Item&lt;/th&gt;
 &lt;th&gt;Cost&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;lastdba.com&lt;/code&gt; domain (Cloudflare, 1 year)&lt;/td&gt;
 &lt;td&gt;¥70&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GitHub Pages hosting&lt;/td&gt;
 &lt;td&gt;¥0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Hugo framework&lt;/td&gt;
 &lt;td&gt;¥0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Blowfish theme&lt;/td&gt;
 &lt;td&gt;¥0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Cloudflare DNS + CDN&lt;/td&gt;
 &lt;td&gt;¥0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Tokens&lt;/td&gt;
 &lt;td&gt;¥60&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;¥130&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Possibly the most cost-effective personal website solution out there.&lt;/p&gt;
&lt;hr&gt;

&lt;h3 class="relative group"&gt;Finally
 &lt;div id="finally" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#finally" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Some details may not be polished — feedback, bug reports, and optimization suggestions welcome.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll likely keep updating.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Reference
 &lt;div id="reference" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reference" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://vonng.com/" target="_blank" rel="noreferrer"&gt;https://vonng.com/&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2026/05/16/" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2026/05/16/&lt;/a&gt;个人博客上线/&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>Case Study: Startup Failure and SysV Shared Memory</title><link>https://lastdba.com/en/2026/03/09/case-study-startup-failure-and-sysv-shared-memory/</link><pubDate>Mon, 09 Mar 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/03/09/case-study-startup-failure-and-sysv-shared-memory/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database instance&amp;rsquo;s RSS memory was maxed out, OOM messages appeared in the logs, and the instance died. We won&amp;rsquo;t analyze the OOM cause here.&lt;/p&gt;
&lt;p&gt;But startup kept failing — 4 or 5 attempts according to the logs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Startup succeeded after the DBA ran &lt;code&gt;ipcrm -m xxx&lt;/code&gt; before starting.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database instance&amp;rsquo;s RSS memory was maxed out, OOM messages appeared in the logs, and the instance died. We won&amp;rsquo;t analyze the OOM cause here.&lt;/p&gt;
&lt;p&gt;But startup kept failing — 4 or 5 attempts according to the logs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:15:21 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;578272&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 2048, ID 1328250881&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:21:03 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;658824&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:12 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;794791&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:31:37 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;801049&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-02-12 09:32:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;814396&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 794791&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Startup succeeded after the DBA ran &lt;code&gt;ipcrm -m xxx&lt;/code&gt; before starting.&lt;/p&gt;
&lt;p&gt;Although the issue was quickly resolved, many questions remained:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why isn&amp;rsquo;t this scenario more common in practice?&lt;/li&gt;
&lt;li&gt;The start.log shows two different error types — what operations and logic do they correspond to?&lt;/li&gt;
&lt;li&gt;Can shared memory still exist even if the postmaster is gone?&lt;/li&gt;
&lt;li&gt;How do you locate and clean up this shared memory segment?&lt;/li&gt;
&lt;li&gt;PG has multiple shared memory segments — which one is this?&lt;/li&gt;
&lt;li&gt;Besides &lt;code&gt;ipcrm -m&lt;/code&gt;, are there other ways to get the instance started?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Error Analysis: &lt;code&gt;pre-existing shared memory block&lt;/code&gt;
 &lt;div id="error-analysis-pre-existing-shared-memory-block" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#error-analysis-pre-existing-shared-memory-block" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Three Types of Shared Memory
 &lt;div id="three-types-of-shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#three-types-of-shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, after PG starts, there are three shared memory segments.&lt;/p&gt;
&lt;p&gt;Using the default &lt;code&gt;shared_memory_type='mmap'&lt;/code&gt; without huge pages as an example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## View PG&amp;#39;s actual shared memory usage from its virtual memory map&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;head -1 $PGDATA/postmaster.pid&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;/smaps | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b61b0563000-2b61b0564000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;116293664&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b61b057f000-2b61b05b3000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;1501001168&lt;/span&gt; /dev/shm/PostgreSQL.1193490778
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b61bbac2000-2b61fa67a000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;1500999610&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From top to bottom, these are: &lt;strong&gt;the SysV shared memory used at startup&lt;/strong&gt;, &lt;strong&gt;shared memory for parallel queries&lt;/strong&gt;, and &lt;strong&gt;shared memory for shared_buffers&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;If shared_buffers uses huge pages, or if the shared_memory_type is SysV instead of mmap, the output differs slightly.&lt;/p&gt;
&lt;p&gt;Huge pages:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aaaaac00000-2aba9ca00000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:0e &lt;span style="color:#ae81ff"&gt;48453452&lt;/span&gt; /anon_hugepage &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b08f2eea000-2b08f2eeb000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;50692152&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b08f2f05000-2b08f302d000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;48436142&lt;/span&gt; /dev/shm/PostgreSQL.1345689218&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;shared_memory_type = &amp;lsquo;sysv&amp;rsquo;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b03b3ceb000-2b03b3d1f000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;1572332304&lt;/span&gt; /dev/shm/PostgreSQL.2883611352
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b03bf0c2000-2b03fdc7a000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;143917075&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;PG Shared Memory Config&lt;/th&gt;
 &lt;th&gt;smaps Segments&lt;/th&gt;
 &lt;th&gt;shared_buffers smaps&lt;/th&gt;
 &lt;th&gt;sysv smaps&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=mmap, no huge pages&lt;/td&gt;
 &lt;td&gt;3 segments&lt;/td&gt;
 &lt;td&gt;/dev/zero&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=sysv, no huge pages&lt;/td&gt;
 &lt;td&gt;2 segments&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=mmap, with huge pages&lt;/td&gt;
 &lt;td&gt;3 segments&lt;/td&gt;
 &lt;td&gt;/anon_hugepage&lt;/td&gt;
 &lt;td&gt;/SYSV00001000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shared_memory_type=sysv, with huge pages&lt;/td&gt;
 &lt;td&gt;not supported&lt;/td&gt;
 &lt;td&gt;not supported&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Now the key question: when the error says &lt;code&gt;pre-existing shared memory block&lt;/code&gt;, which shared memory segment is it talking about?&lt;/p&gt;

&lt;h3 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Searching for the error message in the source quickly leads to the key location: &lt;code&gt;src/backend/port/sysv_shmem.c&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;First, understand what the SysV shmem is for. From scattered README content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;We still require a SysV shmem block to
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * exist, though, because mmap&amp;#39;d shmem provides no way to find out how
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * many processes are attached, which we need for interlocking purposes.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * As of PostgreSQL 9.3, we normally allocate only a very small amount of
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * System V shared memory, and only for the purposes of providing an
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * interlock to protect the data directory. The real shared memory block
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * is allocated using mmap(). This works around the problem that many
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * systems have very low limits on the amount of System V shared memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * that can be allocated. Even a limit of a few megabytes will be enough
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; * to run many copies of PostgreSQL without needing to adjust system settings.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;SysV shmem can determine whether shared memory is still attached; mmap cannot&lt;/li&gt;
&lt;li&gt;This &lt;strong&gt;SysV shmem is used to protect the data directory&lt;/strong&gt;; shared_buffers uses mmap (by default), not SysV&lt;/li&gt;
&lt;li&gt;This SysV shmem segment is tiny (from the virtual addresses we can see it&amp;rsquo;s just 4K = 2b61b0563000-2b61b0564000)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now look at the shm state enum:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_ANALYSIS_FAILURE,	&lt;span style="color:#75715e"&gt;/* unexpected failure to analyze the ID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_ATTACHED,			&lt;span style="color:#75715e"&gt;/* pertinent to DataDir, has attached PIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_ENOENT,			&lt;span style="color:#75715e"&gt;/* no segment of that ID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_FOREIGN,			&lt;span style="color:#75715e"&gt;/* exists, but not pertinent to DataDir */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SHMSTATE_UNATTACHED			&lt;span style="color:#75715e"&gt;/* pertinent to DataDir, no attached PIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} IpcMemoryState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The key states are ATTACHED, FOREIGN, and UNATTACHED.&lt;/p&gt;
&lt;p&gt;The SysV shmem protects the data directory — the common scenario is ensuring the directory isn&amp;rsquo;t running two instances. Since it&amp;rsquo;s shared memory, weird scenarios could mean the segment doesn&amp;rsquo;t belong to this directory or this process (FOREIGN state). If the shared memory corresponds to the data directory but no processes are running, it should be UNATTACHED. With processes running, it&amp;rsquo;s ATTACHED.&lt;/p&gt;
&lt;p&gt;Now look at the error thrown by &lt;code&gt;PGSharedMemoryCreate&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PGShmemHeader &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PGSharedMemoryCreate&lt;/span&gt;(Size size,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 PGShmemHeader &lt;span style="color:#f92672"&gt;**&lt;/span&gt;shim)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;) &lt;span style="color:#75715e"&gt;// infinite loop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(NextShmemSegID, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(PGShmemHeader), &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;span style="color:#75715e"&gt;// shmget to fetch the SysV shmem and return its shmid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (shmid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			oldhdr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			state &lt;span style="color:#f92672"&gt;=&lt;/span&gt; SHMSTATE_FOREIGN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			state &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;PGSharedMemoryAttach&lt;/span&gt;(shmid, NULL, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;oldhdr);&lt;span style="color:#75715e"&gt;// determine this shmem segment&amp;#39;s state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (state)&lt;span style="color:#75715e"&gt;// take different actions based on the shared memory state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...&lt;span style="color:#75715e"&gt;// only showing 2 states here: attached and unattached
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SHMSTATE_ATTACHED: &lt;span style="color:#75715e"&gt;// shm is attached — throw the error (this is the fault symptom we saw)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(FATAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_LOCK_FILE_EXISTS),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pre-existing shared memory block (key %lu, ID %lu) is still in use&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								(&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) NextShmemSegID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								(&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) shmid),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Terminate any old server processes associated with data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 DataDir)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SHMSTATE_UNATTACHED:&lt;span style="color:#75715e"&gt;// shm is unattached
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * The segment pertains to DataDir, and every process that had
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * used it has died or detached. Zap it, if possible, and any
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * associated dynamic shared memory segments, as well. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * shouldn&amp;#39;t fail, but if it does, assume the segment belongs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * to someone else after all, and try the next candidate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * Otherwise, try again to create the segment. That may fail
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * if some other process creates the same shmem key before we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * do, in which case we&amp;#39;ll try the next key.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;// The segment belongs to the data directory, and no process still holds it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (oldhdr&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;dsm_control &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;dsm_cleanup_using_control_segment&lt;/span&gt;(oldhdr&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;dsm_control);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;shmctl&lt;/span&gt;(shmid, IPC_RMID, NULL) &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					NextShmemSegID&lt;span style="color:#f92672"&gt;++&lt;/span&gt;; &lt;span style="color:#75715e"&gt;// Note: ShmemSegID increments and retries
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; }&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When shmem is ATTACHED, it throws the error. When unattached, it loops infinitely, trying to clean up the segment and incrementing ShmemSegID to request a new one.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first case corresponds to this fault&lt;/li&gt;
&lt;li&gt;The second case corresponds to normal crash recovery (instance can still start after a crash)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;SysV shmem
 &lt;div id="sysv-shmem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sysv-shmem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;From PG10 onwards, the postmaster.pid and SysV shmem logic was significantly reworked and has been largely stable since. This article only covers the PG10+ logic.&lt;/p&gt;
&lt;p&gt;pidfile.h:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LOCK_FILE_LINE_SHMEM_KEY	7&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;sysv_shmem.c, InternalIpcMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		line[&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(line, &lt;span style="color:#e6db74"&gt;&amp;#34;%9lu %9lu&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) memKey, (&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;) shmid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;AddToDataDirLockFile&lt;/span&gt;(LOCK_FILE_LINE_SHMEM_KEY, line);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the source code, shmem info is saved on line 7 of postmaster.pid, containing the shmkey and shmid.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1772698474&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;8531&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/tmp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.0.0.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; &lt;span style="color:#75715e"&gt;# &amp;lt;----here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ready&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;What Are shmkey and shmid?
 &lt;div id="what-are-shmkey-and-shmid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-are-shmkey-and-shmid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In PG&amp;rsquo;s source, the call path is: InternalIpcMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(memKey, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, IPC_CREAT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IPC_EXCL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IPCProtection);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;PG uses shmkey/memkey as a seed key to request shared memory from the kernel, which returns a unique identifier, shmid.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;shmid is highly dependent on the server or rather the server&amp;rsquo;s memory state. For PG, when quickly restarting an instance, the shmid may be the same or +1 — this depends on Linux kernel internals. After a full server reboot, it&amp;rsquo;ll be completely different.&lt;/p&gt;
&lt;p&gt;To aid understanding: &lt;strong&gt;whether the server reboots or not, shmkey/memkey can remain constant (since it&amp;rsquo;s user/PG input). But across a server reboot, even with the same shmkey, the returned shmid is very unlikely to be the same value.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;How PG Obtains the shmkey
 &lt;div id="how-pg-obtains-the-shmkey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-pg-obtains-the-shmkey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PGSharedMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * We use the data directory&amp;#39;s ID info (inode and device numbers) to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * positively identify shmem segments associated with this data dir, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * also as seeds for searching for a free shmem key.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;stat&lt;/span&gt;(DataDir, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;statbuf) &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(FATAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not stat data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						DataDir)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Loop till we find a free IPC key. Trust CreateDataDirLockFile() to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * ensure no more than one postmaster per data directory can enter this
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * loop simultaneously. (CreateDataDirLockFile() does not entirely ensure
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * that, but prefer fixing it over coping here.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	NextShmemSegID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; statbuf.st_ino;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		IpcMemoryId shmid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		PGShmemHeader &lt;span style="color:#f92672"&gt;*&lt;/span&gt;oldhdr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		IpcMemoryState state;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Try to create new segment */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		memAddress &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;InternalIpcMemoryCreate&lt;/span&gt;(NextShmemSegID, sysvsize);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (memAddress)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;				&lt;span style="color:#75715e"&gt;/* successful create and attach */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Check shared memory and possibly remove and recreate */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * shmget() failure is typically EACCES, hence SHMSTATE_FOREIGN.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ENOENT, a narrow possibility, implies SHMSTATE_ENOENT, but one can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * safely treat SHMSTATE_ENOENT like SHMSTATE_FOREIGN.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(NextShmemSegID, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(PGShmemHeader), &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG calls &lt;code&gt;stat()&lt;/code&gt; on the data directory, which returns the directory&amp;rsquo;s inode. PG directly uses &lt;code&gt;datadir.inode&lt;/code&gt; as the shmkey.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In PG, the shmem key is tightly coupled to the data directory&amp;rsquo;s inode. Under normal circumstances, shmem key = datadir inode.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Verification example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id $PGDATA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8574/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid |head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917090&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can see datadir.inode = shmkey = 4096.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PG shmkey in Cloud Environments
 &lt;div id="pg-shmkey-in-cloud-environments" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg-shmkey-in-cloud-environments" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Above I said generally shmkey = datadir.inode, but in cloud environments this is typically not the case.&lt;/p&gt;
&lt;p&gt;Our cloud environment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id /lzlcloud/pg8298/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8298/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id /lzlcloud/pg8388/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8388/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ls -id /lzlcloud/pg8095/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; /lzlcloud/pg8095/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat /lzlcloud/pg8298/data/postmaster.pid|head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;971833391&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat /lzlcloud/pg8388/data/postmaster.pid|head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;62128161&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat /lzlcloud/pg8095/data/postmaster.pid|head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4098&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143163441&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The data disk directories all have inode 4096, but the shmkeys are 4096, 4097, 4098.&lt;/p&gt;
&lt;p&gt;Why?&lt;/p&gt;
&lt;p&gt;The inode issue relates to the filesystem:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each filesystem has independent inodes&lt;/li&gt;
&lt;li&gt;The filesystem reserves some inodes — the first few are unusable. Depending on mount options, our data disk&amp;rsquo;s real inodes start at 4096&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So &lt;code&gt;datadir.inode = 4096&lt;/code&gt; is the default behavior of our cloud environment&amp;rsquo;s disk mounts. Other environments may differ — I haven&amp;rsquo;t analyzed those deeply. But with the same filesystem and mount approach for PG data directories, inode collisions are still possible.&lt;/p&gt;
&lt;p&gt;The shmkey issue relates to PG&amp;rsquo;s source code, PGSharedMemoryCreate():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; NextShmemSegID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; statbuf.st_ino;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		shmid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;shmget&lt;/span&gt;(NextShmemSegID, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(PGShmemHeader), &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (state)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SHMSTATE_FOREIGN:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				NextShmemSegID&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The initial shmkey = datadir.inode, but since the requested shmem might be FOREIGN (used by another process), PG increments shmkey by 1 and tries again.&lt;/p&gt;
&lt;p&gt;For example, the instance with shmkey=4097 in postmaster.pid: at startup it tried shmkey=4096, but found that shmid&amp;rsquo;s memory segment was already in use by another instance (the one with shmkey=4096). So it used shmkey+1 to request a different shmid segment.&lt;/p&gt;
&lt;p&gt;Similarly, the instance with shmkey=4098 had to increment twice to find a free shmkey-shmid pair.&lt;/p&gt;

&lt;h3 class="relative group"&gt;shmid Relationships
 &lt;div id="shmid-relationships" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shmid-relationships" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The SysV shmid can be found in &lt;strong&gt;the startup error log&lt;/strong&gt;, &lt;strong&gt;line 7 of postmaster.pid&lt;/strong&gt;, and &lt;strong&gt;virtual memory smaps&lt;/strong&gt;. It can be inspected via the &lt;code&gt;ipcs&lt;/code&gt; command and cleaned up with &lt;code&gt;ipcrm&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Example — note shmid=143917078 throughout:&lt;/p&gt;
&lt;p&gt;Startup error log:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 16:02:19 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;262388&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4096, ID 143917078&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;postmaster.pid line 7:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid |head -7|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Virtual memory smaps:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;head -1 $PGDATA/postmaster.pid&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;/smaps | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2ad2b5189000-2ad2b518a000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Inspecting and cleaning via SysV shmid:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; &lt;span style="color:#75715e"&gt;# cleanup: ipcrm -m shmid&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;att_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:14:51 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;det_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:14:49 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;change_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:14:34 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Testing
 &lt;div id="testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Reproducing the Production Issue
 &lt;div id="reproducing-the-production-issue" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproducing-the-production-issue" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Hold a backend process alive indefinitely, then &lt;code&gt;kill -9&lt;/code&gt; the postmaster:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;241567&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64757&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; kill -stop &lt;span style="color:#ae81ff"&gt;107648&lt;/span&gt; &lt;span style="color:#75715e"&gt;# any backend&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; kill -9 &lt;span style="color:#ae81ff"&gt;64757&lt;/span&gt; &lt;span style="color:#75715e"&gt;# postmaster or another process&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;252283&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64757&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#75715e"&gt;# nattch != 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; pg_ctl start -D $PGDATA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 16:02:19 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;262388&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4096, ID 143917076&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 16:02:19 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;262388&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stopped waiting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: could not start server&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;nattch=1 — the instance cannot start.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Normal Crash Recovery (Successful Startup)
 &lt;div id="normal-crash-recovery-successful-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#normal-crash-recovery-successful-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Essentially, kill the instance and then start it:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;154800&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134329&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; kill -9 &lt;span style="color:#ae81ff"&gt;134329&lt;/span&gt; &lt;span style="color:#75715e"&gt;# postmaster or another process&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id unchanged, segment still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;169360&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134329&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;# nattch=0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmem id unchanged, segment still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; pg_ctl start -D $PGDATA &lt;span style="color:#75715e"&gt;# startup succeeds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 16:14:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;242712&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 16:14:34 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;242712&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; &lt;span style="color:#75715e"&gt;# residual shmem cleaned up during startup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ipcs: id &lt;span style="color:#ae81ff"&gt;143917077&lt;/span&gt; not found
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shmid incremented by 1 at startup&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;273571&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;242712&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; cat postmaster.pid &lt;span style="color:#75715e"&gt;# shmkey unchanged, shmid +1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917078&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A normal &lt;code&gt;kill -9&lt;/code&gt; followed by startup works fine — the residual shmem is cleaned up during startup. shmkey stays the same because inode=4096 and shmkey=4096 wasn&amp;rsquo;t occupied. shmid+1 is Linux kernel behavior, at least indicating a different shared memory segment was used.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Holding a File Descriptor But Not shmem
 &lt;div id="holding-a-file-descriptor-but-not-shmem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#holding-a-file-descriptor-but-not-shmem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Since startup is tied to the data directory inode, and inode is tied to shmem id, startup essentially &lt;strong&gt;checks whether the shmem is held by another process, not whether a file descriptor is still open&lt;/strong&gt;. So let&amp;rsquo;s test with the logger process, which holds file descriptors but not shared memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/77300/smaps | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;# logger process — verify it has no shared memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -stop &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt; &lt;span style="color:#75715e"&gt;# stop logger&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -9 &lt;span style="color:#ae81ff"&gt;77076&lt;/span&gt; &lt;span style="color:#75715e"&gt;# kill -9 pm&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat postmaster.pid &lt;span style="color:#75715e"&gt;# file still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;77076&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/lzlcloud/pg8531/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1772700343&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;8531&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/tmp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.0.0.0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;143917080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ready
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917080&lt;/span&gt; &lt;span style="color:#75715e"&gt;# shared memory still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;77319&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;77076&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;att_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 17:27:11 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;det_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 17:27:15 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;change_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 16:45:43 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ps -ef|grep &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt; &lt;span style="color:#75715e"&gt;# process still alive&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 16:45 ? 00:00:00 postgresql: lzldb: logger
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;135246&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;46622&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 17:27 pts/1 00:00:00 grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;77300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D $PGDATA &lt;span style="color:#75715e"&gt;# startup succeeds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 17:27:55 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;140497&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 17:27:55 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;140497&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The logger holds files in the data directory but is not associated with shared memory — it does not block startup.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Deleting postmaster.pid Then Failing to Start
 &lt;div id="deleting-postmasterpid-then-failing-to-start" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#deleting-postmasterpid-then-failing-to-start" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Same procedure: hold a backend process, &lt;code&gt;kill -9&lt;/code&gt; the PM, delete postmaster.pid, attempt startup.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ll skip the full output — result: startup fails with:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-06 15:29:48 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;22475&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4098, ID 171868173&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-06 15:29:48 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;22475&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-06 15:29:48 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;22475&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: database system is shut down&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This shows: even with a zombie process holding shmem, deleting the postmaster.pid (which contains the shmid) doesn&amp;rsquo;t stop PG from finding the corresponding shmid.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Stop a Different Instance, Start the Current One
 &lt;div id="stop-a-different-instance-start-the-current-one" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#stop-a-different-instance-start-the-current-one" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PG analyzes shmid from two sources to determine if it belongs to the current instance:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The shmid corresponding to &lt;code&gt;datadir.inode&lt;/code&gt; as shmkey, or after &lt;code&gt;shmkey++&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The shmid stored in postmaster.pid&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Even if postmaster.pid is deleted, PG can still tell whether shmem is held by another process. But we can exploit datadir.inode and &lt;code&gt;shmkey++&lt;/code&gt; behavior to get it started.&lt;/p&gt;
&lt;p&gt;Since in our cloud environment all data directory inodes are 4096, and shmkeys differ due to the &lt;code&gt;shmkey++&lt;/code&gt; source logic, we can: &lt;strong&gt;start or stop a PG instance whose datadir.inode = 4096 to shift the current instance&amp;rsquo;s &lt;code&gt;shmkey++&lt;/code&gt; by one, obtaining a different shmid.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -stop &lt;span style="color:#ae81ff"&gt;165245&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -9 &lt;span style="color:#ae81ff"&gt;164411&lt;/span&gt; &lt;span style="color:#75715e"&gt;# stop current instance, keep one of its backend processes alive&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl stop -D /pg8531/data &lt;span style="color:#75715e"&gt;# stop a different instance&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to shut down.... &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server stopped
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D /pg8574/data &lt;span style="color:#75715e"&gt;# try starting the current instance — fails because postmaster.pid still exists&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-05 18:22:35 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;196209&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: pre-existing shared memory block &lt;span style="color:#f92672"&gt;(&lt;/span&gt;key 4097, ID 143917087&lt;span style="color:#f92672"&gt;)&lt;/span&gt; is still in use
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 18:22:35 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;196209&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Terminate any old server processes associated with data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/pg8574/data&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stopped waiting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: could not start server
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Examine the log output.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ mv /lzlcloud/pg8574/data/postmaster.pid&lt;span style="color:#f92672"&gt;{&lt;/span&gt;,.bak&lt;span style="color:#f92672"&gt;}&lt;/span&gt; &lt;span style="color:#75715e"&gt;# delete current instance&amp;#39;s postmaster.pid&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D /lzlcloud/pg8574/data &lt;span style="color:#75715e"&gt;# try again — succeeds&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 18:23:09 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;207725&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: LOG: redirecting log output to logging collector process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-05 18:23:09 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;207725&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Future log output will appear in directory &lt;span style="color:#e6db74"&gt;&amp;#34;/lzlcloud/pg8574/data/pg_log&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ipcs -m -i &lt;span style="color:#ae81ff"&gt;143917087&lt;/span&gt; &lt;span style="color:#75715e"&gt;# the shmid&amp;#39;s SysV segment is still held by our zombie process&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared memory Segment shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;143917087&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; gid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cuid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; cgid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; access_perms&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bytes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; lpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;196209&lt;/span&gt; cpid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;164411&lt;/span&gt; nattch&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;att_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 18:22:35 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;det_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 18:22:35 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;change_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;Thu Mar &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; 18:21:04 &lt;span style="color:#ae81ff"&gt;2026&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Startup succeeds — the current instance requested a different shared memory segment. The old segment wasn&amp;rsquo;t cleaned up. This is the &amp;ldquo;hack&amp;rdquo; of stopping another instance to start the current one in a cloud environment.&lt;/p&gt;
&lt;p&gt;A small prerequisite: the other instance must have not only inode = current instance inode, but also shmkey &amp;lt; current instance shmkey.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Error Analysis: &lt;code&gt;lock file &amp;quot;postmaster.pid&amp;quot; already exists&lt;/code&gt;
 &lt;div id="error-analysis-lock-file-postmasterpid-already-exists" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#error-analysis-lock-file-postmasterpid-already-exists" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This problem is much simpler than the shared memory one.&lt;/p&gt;
&lt;p&gt;During startup, PG checks the lock file and its contained PID, in CreateLockFile():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (other_pid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; my_pid &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; other_pid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; my_p_pid &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			other_pid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; my_gp_pid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;kill&lt;/span&gt;(other_pid, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(errno &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; ESRCH &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; errno &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; EPERM))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* lockfile belongs to a live process */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(FATAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_LOCK_FILE_EXISTS),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;lock file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; already exists&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								filename),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 isDDLock &lt;span style="color:#f92672"&gt;?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 (encoded_pid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postgres (PID %d) running in data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName) &lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postmaster (PID %d) running in data directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName)) &lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 (encoded_pid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postgres (PID %d) using socket file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName) &lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Is another postmaster (PID %d) using socket file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;?&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) other_pid, refName))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Testing is even simpler — just start it a second time while already running:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl start -D /pg8531/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: another server might be running; trying to start server anyway
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to start....2026-03-06 15:59:05 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;89145&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: FATAL: lock file &lt;span style="color:#e6db74"&gt;&amp;#34;postmaster.pid&amp;#34;&lt;/span&gt; already exists
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2026-03-06 15:59:05 CST::@:&lt;span style="color:#f92672"&gt;[&lt;/span&gt;89145&lt;span style="color:#f92672"&gt;]&lt;/span&gt;: HINT: Is another postmaster &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 255500&lt;span style="color:#f92672"&gt;)&lt;/span&gt; running in data directory &lt;span style="color:#e6db74"&gt;&amp;#34;/pg8531/data&amp;#34;&lt;/span&gt;?
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; stopped waiting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl: could not start server
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Examine the log output.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So the later errors in the fault&amp;rsquo;s start.log were because the instance was already running and someone tried starting it multiple more times.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When starting, PG first allocates a SysV shmem segment (not the mmap-based shared_buffers) to lock the data directory. The lock is obtained by using the data directory&amp;rsquo;s inode as the shmkey passed to &lt;code&gt;shmget()&lt;/code&gt;, which returns a unique shmid. Since the requested shmem may already be in use by another process, PG increments &lt;code&gt;shmkey++&lt;/code&gt; in an infinite loop until it finds an unclaimed segment. postmaster.pid line 7 stores both the shmkey and shmid. In cloud environments, you&amp;rsquo;ll often see adjacent PG instances with incrementing shmkeys — this happens because the data disks are mounted identically and share the same starting inode, causing &lt;code&gt;shmkey++&lt;/code&gt; to kick in.&lt;/p&gt;
&lt;p&gt;If a PG instance is killed unexpectedly, the shmem is not automatically cleaned up. Under normal conditions, no zombie process holds the shared memory, so startup cleans it up and proceeds normally. Under abnormal conditions, a zombie process still holds the shared memory — startup fails and manual intervention is required.&lt;/p&gt;
&lt;p&gt;Recommended handling:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;ipcrm -m&lt;/code&gt; (most recommended)&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;lsof&lt;/code&gt; to find the zombie process and kill it&lt;/li&gt;
&lt;li&gt;Reboot the host&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Not recommended but possible workarounds:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;mv postmaster.pid&lt;/code&gt; + stop a different PG instance (where the other instance&amp;rsquo;s shmkey &amp;lt; current instance&amp;rsquo;s shmkey)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mv postmaster.pid&lt;/code&gt; + remount the data disk to change its inode&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Finally, answering the opening questions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why isn&amp;rsquo;t this scenario more common in practice?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Abnormal instance crash + zombie processes still alive. Many crash scenarios leave no zombie processes, so startup just works.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The start.log shows two different error types — what do they correspond to?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &amp;ldquo;shared memory in use&amp;rdquo; error means abnormal crash + zombie processes still exist. The &amp;ldquo;postmaster.pid already exists&amp;rdquo; error means the instance was started multiple times.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can shared memory still exist if the postmaster is gone?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Yes, shared memory can persist when the postmaster is gone — PG processes don&amp;rsquo;t always cleanly exit or get cleaned up by the OS. However, if &lt;em&gt;all&lt;/em&gt; processes are gone, the shared memory should not exist.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How do you locate and clean up this shared memory segment?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The shmid can be found in the startup error log (start.log). Clean it with &lt;code&gt;ipcrm -m $shmid&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG has multiple shared memory segments — which one is this?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The SysV shmem used to protect the data directory. It always exists. See the &amp;ldquo;Three Types of Shared Memory&amp;rdquo; section. It&amp;rsquo;s distinct from the mmap-based shared_buffers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can you find the corresponding shmem via inode or file?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Linux does not provide a userspace interface to find SysV shmem by inode or file (this statement is 100% AI-generated, cross-validated across multiple models). PG uses the data directory&amp;rsquo;s inode as a seed shmkey to request shared memory — it does not directly find shmem by inode. PG has its own mechanism for locating SysV shmem, but it&amp;rsquo;s not an absolute mapping; &lt;code&gt;shmkey++&lt;/code&gt; is a compromise startup logic for this reason.&lt;/p&gt;</content:encoded></item><item><title>DBA, Writing, Learning and the Future in the AI Era</title><link>https://lastdba.com/en/2026/01/21/dba-writing-learning-and-the-future-in-the-ai-era/</link><pubDate>Wed, 21 Jan 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/01/21/dba-writing-learning-and-the-future-in-the-ai-era/</guid><description>&lt;blockquote&gt;&lt;p&gt;AI rate: This article has approximately 60% AI involvement, with about 20 rounds of battling with AI&lt;/p&gt;
&lt;p&gt;Recommendation reason: Contains some reflections and insights on AI Ops, hence recommended&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Writing in the AI Era
 &lt;div id="writing-in-the-ai-era" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#writing-in-the-ai-era" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;For authors who write blogs or WeChat public accounts, AI may be a fatal blow, because AI writing is simply too easy. As someone who writes articles myself, I have many internal struggles about how AI affects writing habits, and it pains me too. Let me revisit some earlier thoughts on writing:&lt;/p&gt;</description><content:encoded>&lt;blockquote&gt;&lt;p&gt;AI rate: This article has approximately 60% AI involvement, with about 20 rounds of battling with AI&lt;/p&gt;
&lt;p&gt;Recommendation reason: Contains some reflections and insights on AI Ops, hence recommended&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Writing in the AI Era
 &lt;div id="writing-in-the-ai-era" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#writing-in-the-ai-era" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;For authors who write blogs or WeChat public accounts, AI may be a fatal blow, because AI writing is simply too easy. As someone who writes articles myself, I have many internal struggles about how AI affects writing habits, and it pains me too. Let me revisit some earlier thoughts on writing:&lt;/p&gt;
&lt;p&gt;Why write?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For myself: To consolidate knowledge. Output is what strengthens input. Glancing at something once versus writing it out are completely different experiences — writing can take several times longer than just reading. For example, when you see a profound and seemingly familiar sentence, rewriting it yourself reveals countless details within it.&lt;/li&gt;
&lt;li&gt;For myself: To leverage others&amp;rsquo; biases constructively. Mainly to use readers&amp;rsquo; expectations as motivation to persist in writing and to enhance the credibility of content. Knowledge you consume yourself may be &amp;ldquo;good enough,&amp;rdquo; but writing for a public audience forces you to weigh every word and take responsibility for others. (Relatively speaking — not actual word-by-word scrutiny.)&lt;/li&gt;
&lt;li&gt;For myself: To build reputation. This depends heavily on writing quality.&lt;/li&gt;
&lt;li&gt;For others/the community: To spread knowledge. Good things should be shared and used by everyone — this is at the core of the PostgreSQL open-source community. Encouraging sharing, not hoarding, is a principle I&amp;rsquo;ve always upheld.&lt;/li&gt;
&lt;li&gt;Building connections: This wasn&amp;rsquo;t my goal, but I have indeed met some friends through it.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Human writing was already difficult; in the AI era, human writing is essentially Hell Mode — like walking against the current without a destination, unable to see any light, while everyone else is heading the opposite direction. I&amp;rsquo;ve certainly experienced AI-powered interpretation, translation, and article generation, but it never feels like mine, or it loses the original purpose of training myself. Or, at a deeper level, I want to feel the vitality of the work.&lt;/p&gt;
&lt;p&gt;The DBA community&amp;rsquo;s articles can be described as a mixed bag — people write about everything. I&amp;rsquo;ve always preferred substantive, content-rich articles focused on PostgreSQL internals and operations, like those by &lt;strong&gt;Cancan and Xiangbo&lt;/strong&gt; — I eagerly anticipate every piece and read them carefully. Generally, content-oriented articles don&amp;rsquo;t get much traffic (both Cancan and Xiangbo have complained about this on their public accounts&amp;hellip;), and I&amp;rsquo;m quite easygoing about it myself.&lt;/p&gt;
&lt;p&gt;However, my previous article &amp;ldquo;PG Operations Database Operations Experience 2025&amp;rdquo; gained a surprising number of followers, which truly astonished me. So I&amp;rsquo;ve been pondering this question for days: &lt;strong&gt;Why would a non-AI-written, non-comprehensive, DBA-focused, knowledge-oriented article attract so much interest? What does AI mean for DBAs?&lt;/strong&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Reflections on Operations
 &lt;div id="reflections-on-operations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reflections-on-operations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Essence of Operations and AI Ops
 &lt;div id="the-essence-of-operations-and-ai-ops" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-essence-of-operations-and-ai-ops" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Operations involve many things. To narrow the scope of discussion, I&amp;rsquo;ll focus on just one small part of operations work — &lt;strong&gt;incident response&lt;/strong&gt; — to interpret the essence of DB Ops. First, my position: &lt;strong&gt;&amp;ldquo;Operations is not merely a technical problem.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Many people argue that since both humans and AI make mistakes, AI can be given authority to act boldly — specifically, if AI&amp;rsquo;s error rate ≤ human error rate, replacement is justified. I thought the same two years ago, but I no longer do. Because the real-world environment is far more complex, with at least the following factors to consider:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The consensus problem.&lt;/strong&gt; There is consensus that a DBA might accidentally delete data, but another consensus is easily overlooked: in normal circumstances, the team assumes the DBA &lt;em&gt;won&amp;rsquo;t&lt;/em&gt; delete data. How to understand this? For example, when hiring a DBA, a responsible team will assess whether the person is mentally stable, then default to assuming they won&amp;rsquo;t delete data, and maintain this assumption throughout long-term work. At the very least, I don&amp;rsquo;t constantly worry that my colleague will drop the database. But when &amp;ldquo;hiring&amp;rdquo; an AI DBA, it has no mental state, and &lt;em&gt;no one&lt;/em&gt; assumes it won&amp;rsquo;t delete data. &amp;ldquo;It will delete data&amp;rdquo; is everyone&amp;rsquo;s consensus, creating deployment resistance.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The importance of data.&lt;/strong&gt; C-end (consumer) data and B-end (business) data have different importance levels. Retail, internet, government, and financial industry data also differ in criticality. The more an industry values data, the more sensitive it is to data reliability and business continuity. A personal computer has no business continuity and only one person cares about data reliability, but in the financial industry, business continuity can directly trigger widespread social concern — financial data reliability simply cannot be questionable. AI Ops deployment must consider system criticality; it cannot be rolled out across all domains simultaneously.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The management system.&lt;/strong&gt; For example, in financial systems, DBAs hold high privileges and are governed by a set of management procedures. So shouldn&amp;rsquo;t an AI DBA also have corresponding management procedures before it can be deployed? What about abnormal login detection, or abnormal backend access? How does it request permissions, and for how long? What level of permission in what scenario? These are all unresolved issues.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI&amp;rsquo;s own security.&lt;/strong&gt; For instance, the paper &lt;em&gt;STRATUS&lt;/em&gt; mentions &lt;em&gt;prompt injection attacks&lt;/em&gt;, for which there is currently no effective solution. If someone injects a &amp;ldquo;drop database&amp;rdquo; prompt, it might just execute it. But humans basically don&amp;rsquo;t have this problem — if you tell a DBA &amp;ldquo;drop database,&amp;rdquo; they&amp;rsquo;ll just ask you what you&amp;rsquo;re trying to do.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The responsibility problem.&lt;/strong&gt; Operations engineering is not a &amp;ldquo;knowledge problem&amp;rdquo; but a &amp;ldquo;responsibility problem.&amp;rdquo; One of the core tasks of operations is to make &lt;strong&gt;irreversible decisions about the system within limited time during an incident, and take responsibility for those actions.&lt;/strong&gt; AI can replace &amp;ldquo;formalizable operations&amp;rdquo; but cannot replace &amp;ldquo;judgments that must bear consequences&amp;rdquo; — at least not yet.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Full of noise.&lt;/strong&gt; Operations is an &amp;ldquo;open system,&amp;rdquo; not a closed reasoning system. Databases run in extremely complex environments, while AI&amp;rsquo;s reasoning premise is that the world can be described in text. But the real operations world is filled with noise, contingency, and undocumented behaviors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Situational pressure.&lt;/strong&gt; Real business environments include recovery time pressure, organizational and customer emotional management, etc. The book &lt;em&gt;Google SRE&lt;/em&gt; describes a common recovery scenario: customers asking when it will be restored, leadership asking why failover hasn&amp;rsquo;t happened, engineers gathering various information under pressure while calling people to confirm recovery procedures. AI cannot feel this pressure. The first two questions are fundamentally not technical problems, but they must be answered. In real scenarios, the answers at that moment are likely to be rough at best.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s imagine what conditions would be needed for fully automated AI operations to truly happen:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;AI won&amp;rsquo;t destroy critical data — at least, the vast majority of people need to reach this consensus about AI.&lt;/li&gt;
&lt;li&gt;Complete management procedures are needed, including how to grant AI permissions, just like how we grant DBA permissions.&lt;/li&gt;
&lt;li&gt;Solve the problem of AI itself being attacked. Not just LLMs, but the entire IT system encompassing AI.&lt;/li&gt;
&lt;li&gt;A no-blame operations culture (or eliminating operations altogether is another approach).&lt;/li&gt;
&lt;li&gt;Accept erroneous judgments. Form consensus around the existence of noise and environment, and tolerate AI Ops iteration cycles.&lt;/li&gt;
&lt;li&gt;If recovery takes too long or the blast radius expands, don&amp;rsquo;t allow human intervention — because if human intervention is required, that person is still the operator (semi-automated AI Ops?).&lt;/li&gt;
&lt;li&gt;Pressure-free recovery context. This means leaders, customers, and public opinion don&amp;rsquo;t need responses, or they trust some AI&amp;rsquo;s response. This is a human transformation, not an IT system transformation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;AIOps and Agent Research Results
 &lt;div id="aiops-and-agent-research-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#aiops-and-agent-research-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The Tsinghua &lt;a href="https://github.com/TsinghuaDatabaseGroup/AIDB" target="_blank" rel="noreferrer"&gt;AIDB&lt;/a&gt; repository&amp;rsquo;s directory contains many AI4DB papers — too many for a person to read. I used NotebookLM to summarize the paper categories:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/46fe43a8cb1f.png" alt="image-20260118161019199" /&gt;&lt;/p&gt;
&lt;p&gt;Again, to narrow the scope (mainly to reduce my own effort), let&amp;rsquo;s focus on database diagnostics content.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;AIOps has made decent academic progress.&lt;/strong&gt; AIOps research integrates machine learning, reinforcement learning, and large language models into database management, covering key tasks such as parameter tuning, index recommendations, query optimization, and fault diagnosis. The goal is to build &amp;ldquo;self-driving&amp;rdquo; database systems with self-awareness and self-healing capabilities. While significantly improving complex workload performance and operational efficiency, this also drives the DBA&amp;rsquo;s transformation from low-efficiency manual intervention to high-level architectural supervision.&lt;/p&gt;
&lt;p&gt;Regarding whether &amp;ldquo;DBAs will be eliminated,&amp;rdquo; current research trends and industry practices (especially self-driving databases and LLM applications) show that the DBA role is undergoing a profound transformation from &amp;ldquo;manual operator&amp;rdquo; to &amp;ldquo;senior manager/supervisor,&amp;rdquo; rather than simple replacement. &lt;strong&gt;The DBA&amp;rsquo;s core value will shift toward managing AI operations strategies, ensuring data security and compliance, and handling extreme anomaly scenarios that AI cannot resolve.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Another &lt;a href="https://mp.weixin.qq.com/s/urqh4NZDmkXvDllBCCdZDA" target="_blank" rel="noreferrer"&gt;AI Ops Frontier Survey&lt;/a&gt; article describes Agents this way:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;This shows that AI Agents are not a silver bullet. To apply Agents, we need not only progress at the model and agent level, but also sufficient support capabilities from the entire operational system — such as Kubernetes-like declarative interfaces, good observability, and reversible operation design. Stratus&amp;rsquo;s preliminary experiments demonstrate the potential of Agents in automated operations, &lt;strong&gt;but there remain enormous gaps in performance, reliability, and security before production deployment.&lt;/strong&gt;&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The development domain, fueled by the booming vibe coding movement, is clearly advancing much faster than AI in operations. I&amp;rsquo;d also love to have a confirm/redo operations remote control — the problem is, it doesn&amp;rsquo;t exist yet. Even if we fantasize about &amp;ldquo;vibe maintaining&amp;rdquo; one day, I doubt many ops people would turn on yolo mode.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Value of a DBA
 &lt;div id="the-value-of-a-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-value-of-a-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Is a DBA&amp;rsquo;s Value Just Being the Decision-Maker and Scapegoat?
 &lt;div id="is-a-dbas-value-just-being-the-decision-maker-and-scapegoat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#is-a-dbas-value-just-being-the-decision-maker-and-scapegoat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Endorsement indeed seems to be something AI cannot solve. So is the DBA&amp;rsquo;s value just being the decision-maker and scapegoat? After all, a DBA&amp;rsquo;s knowledge is far less than AI&amp;rsquo;s — it&amp;rsquo;s just that AI can&amp;rsquo;t make the final call.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Instantaneous Context&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;The DBA&amp;rsquo;s knowledge is far less than AI&amp;rsquo;s&amp;rdquo; — this is true for general knowledge (like how to optimize a SQL query, or the meaning of a configuration parameter). But AI lacks &lt;strong&gt;instantaneous runtime context&lt;/strong&gt;. AI knows database principles, but it doesn&amp;rsquo;t know the accumulated historical debt hiding behind the load balancer during the sudden traffic spike of your company&amp;rsquo;s Double Eleven (Singles&amp;rsquo; Day). The DBA possesses unstructured experience about &amp;ldquo;this specific machine, this specific business, these specific people.&amp;rdquo; In the face of extreme failures, &lt;strong&gt;AI offers the &amp;ldquo;highest-probability suggestion,&amp;rdquo; while the DBA offers &amp;ldquo;the operation that best preserves the system&amp;rsquo;s life under this specific pressure.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. The Last Gate of a Chaotic System&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The database is the most fragile and least fault-tolerant part of all IT architectures (code can be rolled back, but data loss can bankrupt a company). AI&amp;rsquo;s logic is extrapolation based on historical data. When encountering unprecedented underlying hardware bad sectors, extremely rare distributed deadlocks, or novel hacker attack methods, AI&amp;rsquo;s &amp;ldquo;suggestions&amp;rdquo; often fail or even cause secondary damage. The core of &amp;ldquo;making the call&amp;rdquo; is not &amp;ldquo;which solution to choose,&amp;rdquo; but &lt;strong&gt;&amp;ldquo;hedging against risk.&amp;rdquo;&lt;/strong&gt; This kind of &lt;strong&gt;control over extreme situations&lt;/strong&gt; is something current AI cannot provide.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Chain of Trust&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The DBA is the maintainer of the chain of trust: for example, if you let AI audit AI, then who audits the AI&amp;rsquo;s audit logic? At the levels of data security, compliance, and ethics, there must be &lt;strong&gt;a human with the highest privileges who can be held accountable&lt;/strong&gt; as the endpoint of the trust chain.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s flip the perspective: if DBAs really were just &amp;ldquo;less knowledgeable decision-makers and scapegoats,&amp;rdquo; then enterprises would have long ago transferred DBA decision-making authority to SREs, architecture committees, or even AI and other responsible entities. But the reality is, at truly critical moments, enterprises still call &amp;ldquo;that person.&amp;rdquo; This shows the question was never &amp;ldquo;who is smarter,&amp;rdquo; but who can bear the consequences for the organization amid uncertainty. &lt;strong&gt;The DBA is the last human in this chaotic database system who holds the authority to stop losses, the responsibility, and the terminal point of trust.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;So is every decision made by the DBA? Obviously not. The DBA does not hold &amp;ldquo;objective decision-making authority&amp;rdquo; but rather &amp;ldquo;risk veto power&amp;rdquo; — they cannot decide whether the business should take risks, but they can determine which risks the system cannot bear. In simple, low-risk, rollback-able scenarios, decisions are often made automatically by processes or systems; &lt;strong&gt;only when decisions enter high-risk, irreversible territory where responsibility must converge is the DBA pushed to the forefront.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Uniqueness of the Postgres DBA
 &lt;div id="the-uniqueness-of-the-postgres-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-uniqueness-of-the-postgres-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For the specific group of Postgres (PG) DBAs, this uniqueness is even more pronounced.&lt;/p&gt;
&lt;p&gt;In modern technical organizations, DBAs do not naturally hold architectural decision-making authority, nor do they monopolize index or parameter formulation. Architects can design solutions, developers can write SQL, and AI can even provide seemingly comprehensive best-practice recommendations. But these decisions mostly occur at the &lt;strong&gt;abstraction layer, design layer, and probability layer&lt;/strong&gt; — they assume the system is rollback-able, replay-able, and correctable.&lt;/p&gt;
&lt;p&gt;Postgres&amp;rsquo;s uniqueness lies in the fact that &lt;strong&gt;it hands a great deal of freedom to its users&lt;/strong&gt;, and these freedoms ultimately translate into &lt;strong&gt;long-term side effects&lt;/strong&gt; in real systems: write amplification, I/O pattern changes, Vacuum imbalance, WAL bloat, and unpredictable performance degradation. These side effects cannot be fully rehearsed at the design stage, cannot be subcontracted to a single role, and cannot simply be &amp;ldquo;withdrawn&amp;rdquo; after an incident occurs. When the system enters an unstoppable, unreplayable state, the only person still responsible for the overall outcome is often the DBA.&lt;/p&gt;
&lt;p&gt;Therefore, the value of a Postgres DBA lies not in &amp;ldquo;making decisions for others&amp;rdquo; (though you certainly can), but in continuously managing the real-world consequences of all decisions after they have already been made. &lt;strong&gt;&amp;ldquo;Architects define the ideal, developers implement functionality, AI predicts the future; and the DBA guards reality.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This ability to guard reality is based on the PG DBA having sufficient understanding of Postgres, sufficient understanding of the system&amp;rsquo;s real environment, sufficient understanding of the system&amp;rsquo;s history, and sufficient immediate context. In the AI era, one more thing needs to be added: sufficient understanding of AI.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Keep Learning
 &lt;div id="why-keep-learning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-keep-learning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;In the past two years, I&amp;rsquo;ve heard &amp;ldquo;learning is useless&amp;rdquo; rhetoric more than ever before. I generally scoff at such talk. Let me take this opportunity to properly address it.&lt;/p&gt;
&lt;p&gt;Does foundational database knowledge still have value? The answer is: &lt;strong&gt;its value is higher than ever.&lt;/strong&gt; Let&amp;rsquo;s interpret this from three angles: the right to explain, active learning, and why I keep revisiting the classics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. The Right to Explain&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Foundational knowledge enables three things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Identifying &amp;ldquo;systemic inevitable failure points&amp;rdquo; in advance&lt;/li&gt;
&lt;li&gt;Clearly articulating the judgment logic&lt;/li&gt;
&lt;li&gt;Transforming &amp;ldquo;I&amp;rsquo;m going with my gut&amp;rdquo; into &amp;ldquo;this is the system-determined outcome&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The true meaning of learning database fundamentals is not to &amp;ldquo;do more work,&amp;rdquo; but to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Delineate responsibility boundaries&lt;/li&gt;
&lt;li&gt;Enhance discourse power&lt;/li&gt;
&lt;li&gt;Let the system endorse your judgments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;2. Active Learning Becomes an Even Rarer Ability&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In the AI era, the &amp;ldquo;technical barrier&amp;rdquo; to knowledge acquisition approaches zero. Active learning ability is scarce. Why is &amp;ldquo;active learning&amp;rdquo; even rarer in the AI era? This is counter-intuitive but very real. AI makes &amp;ldquo;passive learning&amp;rdquo; extremely comfortable — ask and answer anytime, no long-term investment required, no need to endure cognitive discomfort. But the result is that more and more people stay in the &amp;ldquo;instant gratification layer,&amp;rdquo; unwilling to learn foundational knowledge anymore. When everyone else is regressing, you find yourself advancing faster.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Why Do I Keep Re-reading Classics like &lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Technical people need to read books because books don&amp;rsquo;t just give answers — they &lt;strong&gt;help build a cognitive model that can be run repeatedly and continuously refined.&lt;/strong&gt; AI is currently better at answering questions rather than shaping such models. AI struggles to become this kind of &amp;ldquo;long-term dialogue partner&amp;rdquo; — &lt;strong&gt;its answers are unstable.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From another perspective, looking at the value of books through cognitive economics + information theory + token cost: First, you don&amp;rsquo;t need to battle with AI back and forth. The real cost of battling is not money, but your attention and context-maintenance ability. Second, the hundreds of thousands of words in a book require neither massive prompt input nor excessive token expenditure from you. Third, the knowledge in books has been repeatedly verified by authors and readers — it is already compressed knowledge, the easiest to learn. So: &lt;strong&gt;Classic books = extremely low token cost to obtain high-density, human-repeatedly-verified, focused knowledge in compressed form.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Learning AI Itself&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This needs no elaboration from me.&lt;/p&gt;
&lt;p&gt;My battles with AI led me to an interesting conclusion about &amp;ldquo;why read books&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;In the AI era, knowledge is cheap, but judgment is expensive =&amp;gt; And judgment comes from a stable, calibratable cognitive model =&amp;gt; A stable cognitive model is itself a byproduct of &amp;ldquo;long-term high-quality knowledge intake.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Another real-world piece of evidence supporting the &amp;ldquo;reading is useful&amp;rdquo; argument: this very article depends on books, papers, other articles, and information I&amp;rsquo;ve read. Without that foundation, this article would not exist.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why People Still Love Reading &amp;ldquo;Human-Written&amp;rdquo; Articles
 &lt;div id="why-people-still-love-reading-human-written-articles" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-people-still-love-reading-human-written-articles" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Psychology of Preferring Imperfection
 &lt;div id="the-psychology-of-preferring-imperfection" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-psychology-of-preferring-imperfection" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Human-written technical articles are inevitably riddled with flaws. Looking back at my own &amp;ldquo;Operations Experience 2024&amp;rdquo; from last year, I can find many holes. Even &amp;ldquo;Operations Experience 2025,&amp;rdquo; completed just days ago, I consider incomplete — vastly different from something AI would write. So why do readers still enjoy such flawed technical articles?&lt;/p&gt;
&lt;p&gt;The reason may be that humans are not attracted by &amp;ldquo;information correctness,&amp;rdquo; but by &lt;strong&gt;&amp;ldquo;empathetically imperfect traces of a human mind.&amp;rdquo;&lt;/strong&gt; From the book &lt;em&gt;A Brief History of Intelligence&lt;/em&gt;, we know that the human intelligence model inherently includes self-trial-and-error exploration and observing others&amp;rsquo; behaviors to map onto oneself — this is a learning process, innate to humans. Our brains automatically scan text for hesitation, uncertainty, logical gaps, awkward expressions, emotional leakage, etc. — all things systematically absent from AI text. In the imperfections and emotions of writing, readers can feel the author&amp;rsquo;s thinking and emotions, whereas AI merely presents results. Readers almost never have emotional resonance with AI. Generally speaking, only those who have truly experienced something leave these &amp;ldquo;unattractive&amp;rdquo; traces.&lt;/p&gt;
&lt;p&gt;So I believe many people, like me, can identify purely AI-written technical articles at a glance (not guaranteed 100% accurate) and generally won&amp;rsquo;t have the emotional drive to read through them. But if it&amp;rsquo;s something a human has seriously written, they&amp;rsquo;ll read carefully, feeling the author&amp;rsquo;s feelings, catching their shortcomings or contextual gaps.&lt;/p&gt;
&lt;p&gt;Of course, authors could feed prompts to mimic their previous writing style or deliberately leave flaws. But I haven&amp;rsquo;t seriously tried this — I briefly generated a few pieces and felt the emotional immersion was still quite poor. I don&amp;rsquo;t plan to explore this further; there&amp;rsquo;s not much point.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Borrowing an Expert&amp;rsquo;s ATTENTION for Free
 &lt;div id="borrowing-an-experts-attention-for-free" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#borrowing-an-experts-attention-for-free" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The core difference between AI articles and expert articles is not &amp;ldquo;how well they&amp;rsquo;re written,&amp;rdquo; but a matter of &lt;strong&gt;economic questioning and industry-leading Attention.&lt;/strong&gt; Expert writing is about allocating attention on behalf of the reader; AI writing is about avoiding missing any potentially relevant information. This is not a capability issue — it&amp;rsquo;s a difference in objective functions. &lt;strong&gt;Truly high-value technical articles don&amp;rsquo;t tell you all the correct answers — they block 80% of what you shouldn&amp;rsquo;t be paying attention to right now.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Why do experts dare to &amp;ldquo;delete,&amp;rdquo; while AI doesn&amp;rsquo;t? Because they bear cognitive responsibility for your understanding outcomes. AI does not bear the consequences of you learning or applying things wrong. So experts deliberately filter out details that don&amp;rsquo;t need attention right now. This filtering is itself the value of expertise. For humans, the bottleneck in learning is not insufficient information, but limited attention and not knowing where to look first. &lt;strong&gt;An expert&amp;rsquo;s article directly hands you the result and says: just focus on this. But when facing an LLM, do you know what to look for?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is not a denial of the value of AI articles. AI excels at &amp;ldquo;rapidly expanding the information space when you already know the problem boundaries,&amp;rdquo; while expert articles excel at &amp;ldquo;contracting the problem space for you before you&amp;rsquo;ve established a judgment framework.&amp;rdquo; The former is good for filling gaps and lateral expansion; the latter is good for building core understanding and key intuition. The truly efficient learning approach is not choosing one over the other, but first using experts to achieve Attention alignment, then using AI to do amplified search within the bounded space.&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t saying AI content is useless or human-written content is useless — it&amp;rsquo;s that &lt;strong&gt;each has its own use.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Can AGI Solve All Problems?
 &lt;div id="can-agi-solve-all-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#can-agi-solve-all-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Refuting Musk
 &lt;div id="refuting-musk" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#refuting-musk" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Recently Musk has been painting big pictures again. After reading, I don&amp;rsquo;t agree.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Shared Prosperity or the Useless Class?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The &amp;ldquo;useless class&amp;rdquo; is a concept from Yuval Noah Harari&amp;rsquo;s &lt;em&gt;Homo Deus&lt;/em&gt;. He argues that when AI&amp;rsquo;s productivity surpasses that of ordinary people, using AI to do work will replace having ordinary people do it. These people become the useless class. Resources will increasingly concentrate in the hands of a few elites and large corporations, and most people will lose their jobs — yet there is currently no effective policy to provide a safety net. This view happens to contradict the Musk-style shared prosperity vision. Musk believes that when AGI is realized, no one will need to worry about survival, education, or healthcare — productivity will be so high that governments will provide a safety net for most people. I currently support Harari&amp;rsquo;s view. In fact, from anecdotal perceptual statistics around us, &lt;strong&gt;the population of the useless class is indeed rising.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Can High Productivity Create a Utopia?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;One theory supporting my disagreement with the AGI utopia comes from another book, &lt;em&gt;Evolutionary Psychology — Mate Selection Criteria.&lt;/em&gt; One particularly striking insight: &lt;em&gt;Due to social division of labor and the biological drive to raise well-adapted offspring, men tend to prefer young, healthy women, while women tend to prefer healthy, resourceful men.&lt;/em&gt; This default filtering engraved in our genes means that humans cannot live equally — you don&amp;rsquo;t want to be the one eliminated. &lt;strong&gt;So if a non-comparing, non-competitive, resource-equal utopia could be sustained, productivity is merely one necessary condition among many&lt;/strong&gt; — there are many other social problems that must be solved, which the public tends to overlook. This isn&amp;rsquo;t narrowly referring only to evolutionary psychology; some things haven&amp;rsquo;t been carefully discussed, such as the power struggles in &lt;em&gt;Chimpanzee Politics&lt;/em&gt;, which should also be considered.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Calhoun&amp;rsquo;s Mouse Utopia Experiment&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In 1972, animal behaviorist John B. Calhoun designed and described in detail a famous experimental environment — &amp;ldquo;Universe 25.&amp;rdquo; This was a laboratory &amp;ldquo;utopia&amp;rdquo; specially crafted for mice, striving for perfection in almost every aspect: abundant food, water, and nesting materials; regularly cleaned living environment; no predator threats; temperature maintained between 20°C and 31°C via fans and heating, stable and comfortable.&lt;/p&gt;
&lt;p&gt;The mouse population&amp;rsquo;s march toward extinction seems somewhat insane. I&amp;rsquo;ll focus only on the process: 1) Increased violence 2) No longer pursuing the opposite sex 3) Increased homosexual behavior 4) Increased solitary behavior 5) Males grooming themselves excessively 6) Apathy, etc. Of course, this experiment has flaws. From the intelligence model described in &lt;em&gt;A Brief History of Intelligence&lt;/em&gt;, the intelligence gap between mice and humans still spans a primate — it cannot represent human society. But at minimum, it shows that &lt;strong&gt;utopia triggers new social problems; people won&amp;rsquo;t just quietly live their lives.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. An Economics-Based Society&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From the perspective of modern economics, whether AGI can achieve a &amp;ldquo;shared-prosperity utopia&amp;rdquo; can be divided into two types: retaining the modern economy or not retaining it.&lt;/p&gt;
&lt;p&gt;If we retain the modern economy, AGI can be viewed as an extremely efficient &amp;ldquo;universal factor of production.&amp;rdquo; It significantly reduces the costs of knowledge production, decision support, organizational coordination, and marginal labor, raising the ceiling of society-wide productivity. Under this premise, wealth distribution, public service provision, and social security mechanisms still rely on markets, price signals, incentive structures, and institutional constraints. AGI&amp;rsquo;s role is more about expanding the size of the &amp;ldquo;distributable pie&amp;rdquo; rather than automatically solving distribution problems. In other words, &lt;strong&gt;shared prosperity remains a political economy problem, not a technical one. AGI can only lower the cost of achieving goals; it cannot replace institutional design.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;So, if we don&amp;rsquo;t retain the modern economy and instead try to bypass markets, prices, and incentive systems to directly rely on AGI to achieve some kind of &amp;ldquo;techno-utopia&amp;rdquo; — is it feasible?&lt;/p&gt;
&lt;p&gt;The answer can almost certainly be determined as: no.&lt;/p&gt;
&lt;p&gt;A utopia without a modern economy was repeatedly verified as a failure in the 1960s–70s. The fundamental reason was not that &amp;ldquo;technology wasn&amp;rsquo;t advanced enough&amp;rdquo; at the time, but that the problems of information and incentives were structurally unsolvable:
&lt;strong&gt;Even with powerful centralized computing capability, you cannot replace the preference information transmitted by dispersed individuals through price mechanisms, nor can you sustain innovation drive, responsibility constraints, and resource allocation efficiency over the long term.&lt;/strong&gt; AGI can improve the computational capacity of centralized decision-making, but it cannot eliminate the fundamental economic question of &amp;ldquo;who is responsible for decisions, who bears consequences, who holds the right to choose.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Therefore, AGI is not a replacement for the modern economy, but an amplifier within the modern economy&amp;rsquo;s framework. Any &amp;ldquo;techno-utopia&amp;rdquo; that detaches from market mechanisms, incentive structures, and institutional constraints, whether AGI is introduced or not, will essentially replay the historical path of failure — just in more subtle forms and at higher cost.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Productivity (including the intellectual enhancement brought by AGI) is only one of the conditions required for utopia, and far from the most critical one. Utopia is not a computing power problem, nor an intelligence problem — it is a problem of the stability of human behavior under institutional constraints.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Mathematical Foundation for Why AI Cannot Solve Everything
 &lt;div id="the-mathematical-foundation-for-why-ai-cannot-solve-everything" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-mathematical-foundation-for-why-ai-cannot-solve-everything" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The following is excerpted from Wu Jun&amp;rsquo;s &lt;em&gt;The Beauty of Mathematics&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;In 1900, Hilbert posed many problems, one of which was: &amp;lsquo;Can any (polynomial) Diophantine equation be determined, through a finite number of operations, to have integer solutions or not?&amp;rsquo; If the universal answer to Hilbert&amp;rsquo;s question is negative, then it means that for many mathematical problems, even God doesn&amp;rsquo;t know whether an answer exists — because the Diophantine equation solving problem is only a very small part of all mathematical problems. For problems whose very answer-existence cannot be determined, the answer naturally cannot be found. It was precisely Hilbert&amp;rsquo;s contemplation of the boundaries of mathematical problems that made Turing understand the limits of computation&amp;hellip; Matiyasevich rigorously proved that, except for a very small number of special cases, in general, it is impossible to determine through finite operations whether a Diophantine equation has integer solutions. &lt;strong&gt;The resolution of this problem had a far greater impact on human cognition than its mathematical influence&amp;hellip; If even the solution&amp;rsquo;s existence is unknown, it&amp;rsquo;s even more impossible to solve them through computation.&lt;/strong&gt;&amp;rdquo;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d18842a780aa.png" alt="image-20260120211018586" /&gt;&lt;/p&gt;
&lt;p&gt;&amp;ldquo;A rational-state Turing machine can only solve a subset of problems that have answers&amp;hellip; Many engineering problems are not artificial intelligence problems&amp;hellip; &lt;strong&gt;Today, what we should worry about is not how powerful artificial intelligence or computers are, much less should we think they are omnipotent, because their boundaries have already been clearly delineated by the boundaries of mathematics&amp;hellip;&lt;/strong&gt; There are still many problems in the world that need to be solved by humans. How to make good use of AI tools to more effectively solve human problems is what deserves more attention.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;(See? Reading is useful, right? It explains it clearly to you directly — you probably couldn&amp;rsquo;t ask the right question or get such an accessible answer. See? Following hardcore tech content creators is useful — I filtered it for you. Hit that follow button &amp;#x2b50;)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Conclusion
 &lt;div id="conclusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#conclusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;As a technical blogger, I rarely write about such social issues. I originally just wanted to briefly write about why my previous article got traffic, but explaining this phenomenon somewhat expanded the scope of the problem &amp;#x1f613;.&lt;/p&gt;
&lt;p&gt;Limitations of this article:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Only discussed a very small part of DBA work — incident recovery — without discussing the intelligentization of other tasks.&lt;/li&gt;
&lt;li&gt;GPT knows me too well and seems to be flattering me. It indeed makes very valid points, but I cannot endorse what it says. This is somewhat circular: AI helps me confirm that AI cannot endorse things — an output that inherently cannot be endorsed. From my own perspective, its reasoning is indeed good, with quotable lines throughout.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some Ops scenarios are certainly easy to AI-ify. But through the discussion in this article, AI-ifying the incident recovery domain still faces considerable difficulty. &lt;strong&gt;I have never given up on using AI, nor have I ever given up on using the human brain.&lt;/strong&gt; I simply enjoy identifying in which scenarios AI works well, in which it works poorly, and in which it cannot be used at all. This may give the article a tone that seems pessimistic about AI&amp;rsquo;s future, but my thinking is not pessimistic.&lt;/p&gt;
&lt;p&gt;At the beginning of this article, you can see the AI rate is 50%. In reality, I also discussed similar issues with several friends and included my own thinking, so the true intellectual composition of this article is:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;AI rate 50%, other human brain rate 10%, my brain rate 40%&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;So this article is also a typical case of &amp;ldquo;not giving up on using AI, nor giving up on using the human brain.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Let me conclude with a few questions to briefly state my views:&lt;/p&gt;
&lt;p&gt;Why do people still love reading human-written articles? Psychological preference and attention alignment.&lt;/p&gt;
&lt;p&gt;Is reading useful (not just books)? Useful, and more useful than ever (bad books are more useless than ever; knowledge taste is more important than ever).&lt;/p&gt;
&lt;p&gt;Will AIOps be realized? Yes, but it will take time, and it won&amp;rsquo;t be easy. This requires academic breakthroughs and the thinking and practice of operations (including DBAs).&lt;/p&gt;
&lt;p&gt;Will DBAs be replaced? No. Like software developers, they will experience changes in work patterns but will not disappear.&lt;/p&gt;
&lt;p&gt;Which DBAs will remain? &amp;ldquo;Those who understand both DB and AI, who don&amp;rsquo;t depend on AI, yet don&amp;rsquo;t give up on judgment.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Will AGI be realized? Yes.&lt;/p&gt;
&lt;p&gt;Will AGI achieve universal prosperity? No.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;If you&amp;rsquo;d like to discuss AI Ops or the issues in this article with me, you can find me in various PG groups — I should be easy to find. You can also leave me a message.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;ref
 &lt;div id="ref" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ref" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/TsinghuaDatabaseGroup/AIDB" target="_blank" rel="noreferrer"&gt;https://github.com/TsinghuaDatabaseGroup/AIDB&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/urqh4NZDmkXvDllBCCdZDA" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/urqh4NZDmkXvDllBCCdZDA&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Zhao, Y., et al. (2025). &amp;ldquo;STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds&amp;rdquo;. Advances in Neural Information Processing Systems (NeurIPS)&lt;/p&gt;
&lt;p&gt;&lt;a href="https://zhuanlan.zhihu.com/p/631632685" target="_blank" rel="noreferrer"&gt;https://zhuanlan.zhihu.com/p/631632685&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The Beauty of Mathematics&lt;/em&gt; (《数学之美》)&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Operations Experience 2025</title><link>https://lastdba.com/en/2026/01/11/postgresql-operations-experience-2025/</link><pubDate>Sun, 11 Jan 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/01/11/postgresql-operations-experience-2025/</guid><description>&lt;p&gt;This is a technical operations summary, focused on being accessible and practical. It also serves as a periodic reflection on PostgreSQL database operations. Hope it helps fellow PGers.&lt;/p&gt;
&lt;p&gt;Previous ops experience: &lt;a href="https://www.modb.pro/db/1876933230968975360" target="_blank" rel="noreferrer"&gt;PostgreSQL Operations Experience 2024&lt;/a&gt;. Note: this article does not repeat content from that one.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CPU
 &lt;div id="cpu" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cpu" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL performance problems are the most common root cause in PostgreSQL incident handling. This includes poor SQL performance, suboptimal indexing, sudden high concurrency, and execution plan regressions. For a database like PostgreSQL that lacks a robust plan-binding mechanism, having a DBA team to help design data models, access patterns, indexes, and tune execution plans is crucial — it can significantly reduce sudden CPU saturation incidents.&lt;/p&gt;</description><content:encoded>&lt;p&gt;This is a technical operations summary, focused on being accessible and practical. It also serves as a periodic reflection on PostgreSQL database operations. Hope it helps fellow PGers.&lt;/p&gt;
&lt;p&gt;Previous ops experience: &lt;a href="https://www.modb.pro/db/1876933230968975360" target="_blank" rel="noreferrer"&gt;PostgreSQL Operations Experience 2024&lt;/a&gt;. Note: this article does not repeat content from that one.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CPU
 &lt;div id="cpu" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cpu" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL performance problems are the most common root cause in PostgreSQL incident handling. This includes poor SQL performance, suboptimal indexing, sudden high concurrency, and execution plan regressions. For a database like PostgreSQL that lacks a robust plan-binding mechanism, having a DBA team to help design data models, access patterns, indexes, and tune execution plans is crucial — it can significantly reduce sudden CPU saturation incidents.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Execution Plans
 &lt;div id="execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Execution plan instability is an age-old problem with cost-based optimizers, and PostgreSQL is no exception.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Inaccurate DISTINCT Estimates
 &lt;div id="inaccurate-distinct-estimates" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inaccurate-distinct-estimates" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1976119963471589376" target="_blank" rel="noreferrer"&gt;Case Study: From Inaccurate DISTINCT to DISTINCT Calculation Principles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The default maximum sample size is 30,000 rows. For tables exceeding this size, the estimated distinct count is likely to be low. Note: this assumes the data doesn&amp;rsquo;t have too many unique values.&lt;/p&gt;
&lt;p&gt;Testing on a table with different sample sizes:&lt;/p&gt;
&lt;p&gt;Table: &lt;code&gt;reltuples&lt;/code&gt;=800 million, &lt;code&gt;relpages&lt;/code&gt;=20 million, size=175GB, actual distinct on the target column: 100 million.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;target statistics&lt;/th&gt;
 &lt;th&gt;pages sampling rate&lt;/th&gt;
 &lt;th&gt;tuples sampling rate&lt;/th&gt;
 &lt;th&gt;n_distinct&lt;/th&gt;
 &lt;th&gt;execution time&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;50&lt;/td&gt;
 &lt;td&gt;0.00075&lt;/td&gt;
 &lt;td&gt;0.00001875&lt;/td&gt;
 &lt;td&gt;60K&lt;/td&gt;
 &lt;td&gt;2s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;100&lt;/td&gt;
 &lt;td&gt;0.0015&lt;/td&gt;
 &lt;td&gt;0.0000375&lt;/td&gt;
 &lt;td&gt;110K&lt;/td&gt;
 &lt;td&gt;5s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;1000&lt;/td&gt;
 &lt;td&gt;0.015&lt;/td&gt;
 &lt;td&gt;0.000375&lt;/td&gt;
 &lt;td&gt;1.03M&lt;/td&gt;
 &lt;td&gt;58s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3000&lt;/td&gt;
 &lt;td&gt;0.045&lt;/td&gt;
 &lt;td&gt;0.001125&lt;/td&gt;
 &lt;td&gt;2.68M&lt;/td&gt;
 &lt;td&gt;3m01s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10000&lt;/td&gt;
 &lt;td&gt;0.15&lt;/td&gt;
 &lt;td&gt;0.00375&lt;/td&gt;
 &lt;td&gt;6.75M&lt;/td&gt;
 &lt;td&gt;7m21s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(&lt;code&gt;target statistics&lt;/code&gt; max value: 10000)&lt;/p&gt;
&lt;p&gt;Rough summary: n_distinct and analyze execution time grow proportionally with sample size.&lt;/p&gt;
&lt;p&gt;n_distinct increases with sample size, while pages and tuples estimates remain consistently accurate.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Generic Plan Interference
 &lt;div id="generic-plan-interference" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#generic-plan-interference" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL execution plans must account for generic plans. A generic plan is parameter-independent — it uses default values to compute cost, then compares against the first five custom plan costs; whichever is cheaper wins.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1964312913808732160" target="_blank" rel="noreferrer"&gt;Case Study: Adding an Index Causes Performance Degradation and Generic Plans&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Classification of generic plan estimation problems&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Because of the 5-execution comparison mechanism, generic plan problems fall into two categories:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first 5 SQL executions are not representative. Heavily dependent on data skew and whether the first 5 parameter values are representative.&lt;/li&gt;
&lt;li&gt;The generic plan itself is flawed. Due to data skew or inability to accurately compute selectivity even with balanced data, the generic plan is inherently inefficient.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;II. Solution reference&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Generic plan problems often surface on partitioned tables. When the partition key is continuous, scanning all partitions should yield a selectivity of 1, but the generic plan estimates 0.05 — likely resulting in a &amp;ldquo;full index scan&amp;rdquo; scenario.&lt;/p&gt;
&lt;p&gt;Consider these when optimizing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t create too many indexes that confuse the optimizer&lt;/li&gt;
&lt;li&gt;Eliminate generic plan interference. Execute the prepared statement 6 times for real&lt;/li&gt;
&lt;li&gt;Compare plans with session-level &lt;code&gt;set plan_cache_mode='force_generic_plan';&lt;/code&gt; or &lt;code&gt;set plan_cache_mode='force_custom_plan';&lt;/code&gt;; or on PG 16+, use &lt;code&gt;explain (GENERIC_PLAN)&lt;/code&gt; to compare&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syntax reference:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--prepare/execute
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PREPARE sql1(text) AS
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SELECT COUNT(*) FROM LZL where a=$1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXECUTE sql1(&amp;#39;zzz&amp;#39;); --run 6 times first
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXPLAIN EXECUTE sql1(&amp;#39;zzz&amp;#39;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;select * from pg_prepared_statements --view prepared statement info, current session only
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--Compare execution plans, set session parameter then EXPLAIN EXECUTE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;set plan_cache_mode=&amp;#39;force_generic_plan&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;set plan_cache_mode=&amp;#39;force_custom_plan&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--Directly view generic plan, 16+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;explain (GENERIC_PLAN) xx &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;LWLock:Lockmanager Caused by Row Locks
 &lt;div id="lwlocklockmanager-caused-by-row-locks" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lwlocklockmanager-caused-by-row-locks" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;LWLock Lockmanager&lt;/code&gt; issues typically occur on partitioned tables under high concurrency with queries lacking partition keys. This year, a new scenario was discovered: &lt;a href="https://www.modb.pro/db/1995089823380627456" target="_blank" rel="noreferrer"&gt;Row Locks Causing LWLock:Lockmanager&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This isn&amp;rsquo;t a major issue — blocking on concurrent updates to the same row is well known. I just hadn&amp;rsquo;t expected that updating the same row could also produce &lt;code&gt;LWLock:Lockmanager&lt;/code&gt;. Not a particularly valuable case study, but when you see &lt;code&gt;LWLock:Lockmanager&lt;/code&gt; as a wait event, consider row locks.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Idle Connections
 &lt;div id="idle-connections" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#idle-connections" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL performance generally improves with each major release. PG 14 made &lt;a href="https://liuzhilong.blog.csdn.net/article/details/130783036" target="_blank" rel="noreferrer"&gt;significant optimizations&lt;/a&gt; to snapshot acquisition and backend transaction tracking, yielding noticeable improvements for high idle connection counts:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/88df744da257.jpg" alt="performance-impact-of-idle-connections-48active-prepost.png" /&gt; (&lt;a href="https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;However, this doesn&amp;rsquo;t mean you can ignore idle connections after PG 14. They still consume backend transaction maintenance overhead, cause context switches, fragment memory, etc. — the more idle connections, the worse the performance.&lt;/p&gt;
&lt;p&gt;Typically, application connections have keepalive and pooling. Maintaining some idle connections avoids creating new connections for every request, which would be far more expensive. Small databases generally don&amp;rsquo;t need to worry much about connection counts (as long as they&amp;rsquo;re not absurd) — CPUs are cheap, the system isn&amp;rsquo;t critical, and scaling is easy. But large databases are different. CPU count is the hard limit; you can&amp;rsquo;t just add more. Large databases already have many idle connections; adding more doesn&amp;rsquo;t necessarily increase throughput — when CPU is already tight, it can backfire.&lt;/p&gt;
&lt;p&gt;PG 15 benchmark experience: with 5K idle as baseline, increasing to 10K idle adds ~2-5 vCPU overhead for idle maintenance; 20K idle adds ~5-10 vCPU. Approximate.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Idle in Transaction
 &lt;div id="idle-in-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#idle-in-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Last year I thoroughly criticized long transactions, because they impact PostgreSQL more severely than other databases (Oracle, MySQL, etc.). But this is manageable — with proper alerting and operations, long transactions are solvable.&lt;/p&gt;
&lt;p&gt;When monitoring session states, you need to check them. &lt;code&gt;active&lt;/code&gt; means running SQL, &lt;code&gt;idle in transaction&lt;/code&gt; means in a transaction but not currently executing SQL. All &lt;a href="https://www.postgresql.org/docs/18/monitoring-stats.html#MONITORING-PG-STAT-ACTIVITY-VIEW" target="_blank" rel="noreferrer"&gt;pg_stat_activity states, PG 15&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;Current overall state of this backend. Possible values are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;active&lt;/code&gt;: The backend is executing a query.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idle&lt;/code&gt;: The backend is waiting for a new client command.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idle in transaction&lt;/code&gt;: The backend is in a transaction, but is not currently executing a query.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idle in transaction (aborted)&lt;/code&gt;: This state is similar to &lt;code&gt;idle in transaction&lt;/code&gt;, except one of the statements in the transaction caused an error.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fastpath function call&lt;/code&gt;: The backend is executing a fast-path function.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;disabled&lt;/code&gt;: This state is reported if &lt;a href="https://www.postgresql.org/docs/15/runtime-config-statistics.html#GUC-TRACK-ACTIVITIES" target="_blank" rel="noreferrer"&gt;track_activities&lt;/a&gt; is disabled in this backend.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Common states are: &lt;code&gt;active&lt;/code&gt;, &lt;code&gt;idle&lt;/code&gt;, &lt;code&gt;idle in transaction&lt;/code&gt;, &lt;code&gt;idle in transaction (aborted)&lt;/code&gt;. A common misconception about &lt;code&gt;idle in transaction&lt;/code&gt;: it only means no SQL is running &lt;em&gt;right now&lt;/em&gt; and the transaction hasn&amp;rsquo;t committed — it does NOT mean the transaction has been idle for a long time. Don&amp;rsquo;t use &lt;code&gt;xact_start&lt;/code&gt; + &lt;code&gt;idle in transaction&lt;/code&gt; to judge how long a transaction has been idle. Use &lt;code&gt;state_change&lt;/code&gt; + &lt;code&gt;idle in transaction&lt;/code&gt; instead.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory
 &lt;div id="memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Memory issues are extremely tricky, and I handled many this year, finding some good solutions. But memory knowledge is broad — I&amp;rsquo;ll try to simplify as much as possible, going straight to symptoms, results, and solutions.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Memory Issues and Huge Pages
 &lt;div id="memory-issues-and-huge-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-issues-and-huge-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Classification of PostgreSQL memory problems:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1fdf8b816eb0.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;p&gt;Relevant wchan states for PG memory issues:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c2d5d422e6f9.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;p&gt;Huge pages are very effective against memory fragmentation and direct memory reclaim within cgroups.&lt;/p&gt;
&lt;p&gt;Benchmark results for huge pages: &lt;a href="https://docs.paic.com.cn/#/post/84479375" target="_blank" rel="noreferrer"&gt;https://docs.paic.com.cn/#/post/84479375&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Theoretical benefits of huge pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduced TLB pressure&lt;/li&gt;
&lt;li&gt;Reduced page table size in main memory&lt;/li&gt;
&lt;li&gt;Huge pages are physically contiguous. Contiguous physical memory access is better than non-contiguous&lt;/li&gt;
&lt;li&gt;With huge pages, pages are directly mapped without multi-level PTE entries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, huge pages bring management challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Must pre-allocate huge pages&lt;/li&gt;
&lt;li&gt;Must calculate huge page size in advance to avoid memory waste&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Memory knowledge is extensive. For more, refer to &lt;a href="https://lastdba.com/en/2025/06/19/linux%E5%86%85%E5%AD%98%E8%BF%9B%E9%98%B6/" &gt;Advanced Linux Memory&lt;/a&gt;. Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rule out OS-level issues before tackling PG instance-level issues&lt;/li&gt;
&lt;li&gt;Huge pages have remarkable effects, but in rare cases they don&amp;rsquo;t help&lt;/li&gt;
&lt;li&gt;Many people don&amp;rsquo;t monitor pgpgin/pgpgout/pgfree, or even pgscank/pgscand — they only look at CPU and memory usage. That&amp;rsquo;s insufficient for operating PostgreSQL.&lt;/li&gt;
&lt;li&gt;Without good operational practices, PG memory can be very unstable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Notable Cgroup Knowledge
 &lt;div id="notable-cgroup-knowledge" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#notable-cgroup-knowledge" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Cgroup knowledge is also extensive. Refer to earlier articles; here&amp;rsquo;s a quick summary.&lt;/p&gt;
&lt;p&gt;Cgroup v1 has inherent flaws:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Does not account for cgroup page tables&lt;/li&gt;
&lt;li&gt;Does not account for cgroup slab&lt;/li&gt;
&lt;li&gt;Does not account for cgroup huge pages (huge pages are not charged, not just uncounted)&lt;/li&gt;
&lt;li&gt;Does not account for cgroup async/sync page reclaim&lt;/li&gt;
&lt;li&gt;Cgroup RSS and process RSS have inconsistent accounting methods&lt;/li&gt;
&lt;li&gt;shmem accounting is messy&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Unsolved Mysteries
 &lt;div id="unsolved-mysteries" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#unsolved-mysteries" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Huge pages have solved many problems, but not all. The unsolved portion remains to be researched — hopefully clarified in 2026.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Pay Attention to the OS
 &lt;div id="pay-attention-to-the-os" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pay-attention-to-the-os" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Pay Attention to Everything OS
 &lt;div id="pay-attention-to-everything-os" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pay-attention-to-everything-os" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;To operate open-source databases well, you need to understand the operating system.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;(Source forgotten)&lt;/p&gt;
&lt;p&gt;To operate PostgreSQL well, understanding OS principles is essential. PostgreSQL is built on top of the OS (especially Linux) — it uses whatever Linux provides. PostgreSQL is part of the Linux ecosystem. To truly understand how it works, understand the OS first.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Rule out OS-level issues before tackling PG instance-level issues.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;(My own words)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. CPU&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since PostgreSQL doesn&amp;rsquo;t use NUMA, whether on bare metal or cgroup/pod-managed CPU, you rarely need to dive into OS-level CPU internals. CPU issues can mostly be diagnosed from SQL or PG stack traces.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;II. Memory&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;See the Memory section. Memory issues require OS-level investigation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;III. Processes&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Inspecting PG process states from the OS is critical. You need to check D state, wchan, RSS, syscalls, at minimum.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IV. Host Status and Logs&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Monitor host status — CPU, memory, IO, network, logs at the host level. Very important.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s hard to imagine that a vague network IO alert like &amp;ldquo;an I/O error occurred while sending to the backend&amp;rdquo; is related to underlying storage. Beyond &lt;code&gt;/var/log/messages&lt;/code&gt;, PG itself shows nothing. (Of course, this error may have other causes — don&amp;rsquo;t misinterpret.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;V. Others&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Uncategorized.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Physical Reads
 &lt;div id="physical-reads" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-reads" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL itself does &lt;strong&gt;not directly expose a &amp;ldquo;true physical disk read&amp;rdquo; metric&lt;/strong&gt;. The various reads in &lt;code&gt;pg_stat_*&lt;/code&gt; (e.g., &lt;code&gt;pg_stat_database.blks_read&lt;/code&gt;) are reads from the OS cache.&lt;/p&gt;
&lt;p&gt;So how do you monitor physical reads?&lt;/p&gt;
&lt;p&gt;Reads or buffer allocation metrics are supplementary. The best approach is monitoring the OS itself.&lt;/p&gt;
&lt;p&gt;The OS is PostgreSQL&amp;rsquo;s ecosystem. Never look at the database in isolation. Not being able to monitor physical reads at the database level is nothing to be ashamed of — as long as you have a solution.&lt;/p&gt;
&lt;p&gt;Monitor iostat and other disk metrics. For cloud environments, OS-level observability is already mature — don&amp;rsquo;t waste cloud-native observability.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Autovacuum
 &lt;div id="autovacuum" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#autovacuum" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL for monitoring autovacuum processes: &lt;a href="https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0067_autovacuum_queue_and_progress.md" target="_blank" rel="noreferrer"&gt;sql autovacuum_queue_and_progress&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Autovacuum Freeze on Large Databases
 &lt;div id="autovacuum-freeze-on-large-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#autovacuum-freeze-on-large-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;With properly configured parameters, monitoring, and alerting, autovacuum freeze requires little attention in most databases.&lt;/p&gt;
&lt;p&gt;However, in databases with extremely high transaction throughput and very large data volumes, you still can&amp;rsquo;t ignore it. Autovacuum prevent wraparound may be running constantly. At minimum, watch these two points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Age alerting: handle promptly and try to prevent the next alert. Don&amp;rsquo;t wait until the last moment to panic (acceleration options depend on version, e.g., &lt;code&gt;INDEX_CLEANUP OFF&lt;/code&gt;, &lt;code&gt;BUFFER_USAGE_LIMIT&lt;/code&gt; adjustments)&lt;/li&gt;
&lt;li&gt;Impact on memory (especially cache). If autovacuum runs nonstop on a very large database, it impacts cache and memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For principles and parameters, see this howtos diagram:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c216c393371f.jpg" alt="Wraparound and freeze" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Large Tables That Won&amp;rsquo;t Finish Vacuuming
 &lt;div id="large-tables-that-wont-finish-vacuuming" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-tables-that-wont-finish-vacuuming" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&amp;ldquo;Large tables&amp;rdquo; means hundreds of GB, typically with many indexes and dead tuples that prevent vacuum from completing.&lt;/p&gt;
&lt;p&gt;The main bottleneck: (auto)vacuum cleans dead index tuples one by one per dead row. Large table (auto)vacuum is slow here — you&amp;rsquo;ll typically see many dead tuples on the table. Worse, (auto)vacuum may run slower than the rate of dead tuple generation — vacuum never finishes, infinite bloat.&lt;/p&gt;
&lt;p&gt;Experience with large tables that can&amp;rsquo;t finish:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For the same table, dead tuple count is &lt;em&gt;roughly&lt;/em&gt; proportional to execution time&lt;/li&gt;
&lt;li&gt;From autovacuum log&amp;rsquo;s user time and elapsed time, you can observe CPU time and execution time, and roughly estimate delay sleep time&lt;/li&gt;
&lt;li&gt;Disabling autovacuum cost-based delay can reduce execution time by ~3× (index-size dependent; based on a 200GB table with 280GB indexes)&lt;/li&gt;
&lt;li&gt;Adjusting a table&amp;rsquo;s autovacuum cost-based delay means letting autovacuum rest less when processing that table — consuming more CPU and scan IO in a shorter time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;How to accelerate?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Repack&lt;/strong&gt;. Repack is a nuclear option — fast table rebuild for emergencies. But repack is a CLI tool; running it manually each time is cumbersome.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tune autovacuum cost-based delay parameters&lt;/strong&gt;. Either 1. Increase cost limit: &lt;code&gt;alter table t1 SET (autovacuum_vacuum_cost_limit=1000);&lt;/code&gt;, or 2. Disable delay entirely: &lt;code&gt;alter table t1 SET (autovacuum_vacuum_cost_delay=0);&lt;/code&gt;. Recommended only for tables that can&amp;rsquo;t keep up.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Drop unnecessary indexes&lt;/strong&gt;. Scanning indexes and updating index entries takes the most time — dropping unnecessary indexes is effective.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Partitioned tables&lt;/strong&gt;. Recommended partition size ≤10GB. &lt;em&gt;Converting to partitioned tables is the best solution.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Drop updated_time column indexes&lt;/strong&gt; to leverage HOT, reducing bloat rate.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Checkpoint and Bgwriter
 &lt;div id="checkpoint-and-bgwriter" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkpoint-and-bgwriter" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The checkpointer not only creates checkpoints (affecting recovery time) but also flushes dirty buffers. The bgwriter only flushes dirty buffers. Starting from PG 17, some metrics moved to &lt;code&gt;pg_stat_checkpointer&lt;/code&gt;. For PG ≤16, mainly look at &lt;code&gt;pg_stat_bgwriter&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Checkpoint intervals&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Metric &lt;code&gt;checkpoints_timed&lt;/code&gt;: corresponds to &lt;code&gt;checkpoint_timeout&lt;/code&gt; parameter&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;checkpoints_req&lt;/code&gt;: corresponds to &lt;code&gt;max_wal_size&lt;/code&gt; parameter&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Recommend using &lt;code&gt;checkpoint_timeout&lt;/code&gt; as the primary checkpoint interval. If &lt;code&gt;checkpoints_req&lt;/code&gt; appears, increase &lt;code&gt;max_wal_size&lt;/code&gt; and tune flush parameters accordingly. When FPIs are present, also check these two metrics.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;II. Flush metrics&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_checkpoint&lt;/code&gt;: dirty buffers flushed by checkpointer&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_clean&lt;/code&gt;: dirty buffers flushed by bgwriter&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_backend&lt;/code&gt;: dirty buffers flushed by backends — should be as close to zero as possible; occurrence means bgwriter isn&amp;rsquo;t aggressive enough&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;buffers_backend_fsync&lt;/code&gt;: meaning unclear&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The tuning goal is flush priority: &lt;strong&gt;bgwriter flush &amp;gt; checkpointer flush &amp;gt; backend flush&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The checkpointer can flush as a side effect, but checkpointer flush speed is hard to control — it can cause IO spikes. So bgwriter flush priority should be higher than checkpointer. Backend flush is obviously worst — minimize it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;III. Bgwriter flush parameters&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Bgwriter controls flush speed through a &amp;ldquo;write some, pause, write again&amp;rdquo; cycle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_delay&lt;/code&gt;: how long to pause&lt;/li&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt;: max pages to write per cycle&lt;/li&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_lru_multiplier&lt;/code&gt;: pages per cycle = (recent buffer allocation × lru_multiplier), capped at lru_maxpages&lt;/li&gt;
&lt;li&gt;Parameter &lt;code&gt;bgwriter_flush_after&lt;/code&gt;: fsync after writing this many buffers&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;pg_buffers_alloc&lt;/code&gt;: represents shared memory buffer allocation (allocation means actual eviction occurred, somewhat indicative of pgpgin)&lt;/li&gt;
&lt;li&gt;Metric &lt;code&gt;maxwritten_clean&lt;/code&gt;: number of times &lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt; was reached&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Default bgwriter flush logic: &lt;strong&gt;each cycle: flush (new buffer count × 2, max 100 dirty buffers), delay 200ms, fsync every 64 buffers flushed&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Per-cycle flush volume depends on recent buffer allocation and &lt;code&gt;bgwriter_lru_multiplier&lt;/code&gt;. During peak times, buffer allocation is typically high, so it usually hits &lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt;. Thus: &lt;strong&gt;&lt;code&gt;bgwriter_lru_maxpages&lt;/code&gt; caps peak flush volume; &lt;code&gt;bgwriter_lru_multiplier&lt;/code&gt; prevents excessive flushing during off-peak times&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IV. Flush parameter reference&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Default max bgwriter flush = 100 × 5 × 8KB = 3.9MB/s. The defaults are definitely too low. If tuning upward, adjust based on &lt;code&gt;shared_buffers&lt;/code&gt; size and workload.&lt;/p&gt;
&lt;p&gt;After all that theory, here&amp;rsquo;s a practical reference:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Read/write ratio 2:8, high load&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_buffers&lt;span style="color:#f92672"&gt;=&lt;/span&gt;40GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;checkpoint_timeout&lt;span style="color:#f92672"&gt;=&lt;/span&gt;20min;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_wal_size&lt;span style="color:#f92672"&gt;=&lt;/span&gt;80GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bgwriter_delay&lt;span style="color:#f92672"&gt;=&lt;/span&gt;20ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bgwriter_lru_maxpages&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bgwriter_lru_multiplier&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Adjust further as needed.&lt;/p&gt;
&lt;p&gt;As for effects: from practical experience, don&amp;rsquo;t expect standalone bgwriter tuning to yield great results. Overly aggressive bgwriter tuning can even backfire.&lt;/p&gt;
&lt;p&gt;So: &lt;strong&gt;If your database hasn&amp;rsquo;t been clearly diagnosed with checkpoint flush spikes or other flush issues, don&amp;rsquo;t touch this.&lt;/strong&gt; Only recommended for core large databases with high concurrency, as a supplementary tuning strategy alongside other changes (migrations, shared_buffer adjustments, etc.).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;V. Flush parameter summary&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Bgwriter flushing can be summarized as &amp;ldquo;three hard&amp;rsquo;s&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;Hard to understand, hard to tune, hard to see results.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;DB4AI
 &lt;div id="db4ai" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#db4ai" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;AI Task Scheduling Writes to Database
 &lt;div id="ai-task-scheduling-writes-to-database" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ai-task-scheduling-writes-to-database" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;AI applications are widely deployed at the development level. One scenario: AI task invocations write to the database. Task invocations can spike instantly, and the database writes may lack concurrency control, causing CPU or other resource spikes.&lt;/p&gt;
&lt;p&gt;This is a new database incident pattern in the AI era. Be careful.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Vector HNSW
 &lt;div id="vector-hnsw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-hnsw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Reference: &lt;a href="https://postgresql.us/events/pgconfnyc2024/sessions/session/1862/slides/172/pgvector_best_practices_pgconfnyc2024.pdf" target="_blank" rel="noreferrer"&gt;https://postgresql.us/events/pgconfnyc2024/sessions/session/1862/slides/172/pgvector_best_practices_pgconfnyc2024.pdf&lt;/a&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Build Acceleration
 &lt;div id="hnsw-index-build-acceleration" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-build-acceleration" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;HNSW index builds can be extremely slow — millions of rows can take hours.&lt;/p&gt;
&lt;p&gt;Factors affecting HNSW build speed include instance memory (and CPU) as well as index build parameters:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;maintenance_work_mem&lt;span style="color:#f92672"&gt;=&lt;/span&gt;3g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_parallel_maintenance_workers&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;m&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ef_construction&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Building HNSW indexes can be painful. Ways to accelerate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Building the index before data load is an option. Though the total initial time is slower, developers may accept &amp;ldquo;a bit slower&amp;rdquo; but cannot accept &amp;ldquo;index building for 1 hour.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Optimizing post-load index builds:
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;SET maintenance_work_mem = '8GB'&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SET max_parallel_maintenance_workers = 8&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Post-load index builds need attention to memory — strongly related to instance memory and free memory.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Note: &lt;code&gt;maintenance_work_mem&lt;/code&gt; can protect OS memory. If &lt;code&gt;maintenance_work_mem&lt;/code&gt; exceeds available OS memory and the table is large, the connection is terminated immediately (fast failure):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;53200&lt;/span&gt;: could &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; resize shared memory segment &lt;span style="color:#e6db74"&gt;&amp;#34;/PostgreSQL.1390017142&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6439348672&lt;/span&gt; bytes: Cannot &lt;span style="color:#66d9ef"&gt;allocate&lt;/span&gt; memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: dsm_impl_posix, dsm_impl.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;314&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Note: if memory used during build exceeds &lt;code&gt;maintenance_work_mem&lt;/code&gt;, an info notice appears (after some time):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NOTICE: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: hnsw graph &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; longer fits &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; maintenance_work_mem &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;886990&lt;/span&gt; tuples
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: Building will take significantly &lt;span style="color:#66d9ef"&gt;more&lt;/span&gt; time.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Increase maintenance_work_mem &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; speed up builds.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: InsertTuple, hnswbuild.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;525&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;HNSW Index Query Performance
 &lt;div id="hnsw-index-query-performance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-query-performance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Query recall and performance need to be balanced via the &lt;code&gt;ef_search&lt;/code&gt; parameter.&lt;/p&gt;
&lt;p&gt;Besides &lt;code&gt;ef_search&lt;/code&gt;, one more factor significantly impacts query speed: &lt;strong&gt;whether the HNSW index is cached in memory&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Index NOT in memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; image_id, applyNo, feature_vector &lt;span style="color:#f92672"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; vectorsit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; image_features_test2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; distance
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11865&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;073&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;185&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1796&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9309&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I&lt;span style="color:#f92672"&gt;/&lt;/span&gt;O Timings: shared&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;local&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82108&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;559&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;009&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; test_0 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1360&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_feature_hnsw &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; image_features_test2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;78&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1292546&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;989705&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;071&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;By&lt;/span&gt;: (feature_vector &lt;span style="color:#f92672"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1796&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9309&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I&lt;span style="color:#f92672"&gt;/&lt;/span&gt;O Timings: shared&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;local&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;82108&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;559&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;130&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;82193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;279&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Index IN memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11865&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;240&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;350&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11105&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; test_0 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1360&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;007&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_feature_hnsw &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; image_features_test2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11852&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;78&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1292546&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;989705&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;239&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;344&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;By&lt;/span&gt;: (feature_vector &lt;span style="color:#f92672"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11105&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;093&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;392&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Same index, same execution plan — &lt;strong&gt;the performance difference between index-in-memory and index-not-in-memory is 82193.279 / 20.392 = 4000×!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This gap cannot be ignored. When monitoring HNSW index performance, always check whether the index is in memory. Reference SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Check if HNSW index is cached in shared buffers via pg_buffercache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.relname, pg_size_pretty(&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; buffered, round(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;/&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; setting &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_settings &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;shared_buffers&amp;#39;&lt;/span&gt;)::integer, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; buffer_percent, round(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt; pg_table_size(&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid), &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; percent_of_relation &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INNER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_buffercache b &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; b.relfilenode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.relfilenode &lt;span style="color:#66d9ef"&gt;INNER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; (b.reldatabase &lt;span style="color:#f92672"&gt;=&lt;/span&gt; d.oid &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; d.datname &lt;span style="color:#f92672"&gt;=&lt;/span&gt; current_database()) &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid, &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.relname &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; buffered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; buffer_percent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; percent_of_relation 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------+------------+----------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_feature_hnsw_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117&lt;/span&gt; MB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;91&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_feature_hnsw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;78&lt;/span&gt; MB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_inherits_parent_index &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Application Releases
 &lt;div id="application-releases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#application-releases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;DDL Tips
 &lt;div id="ddl-tips" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ddl-tips" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Online DDL tools like pg-osc and pg_migrate don&amp;rsquo;t support partitioned tables, and they have other issues — real-world use is difficult. So DDL tips are still useful: lowering lock levels, proactively identifying blocking, etc., to reduce DDL blocking and rewrite risks.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5f610ac9b703.png" alt="picddl" /&gt;&lt;/p&gt;
&lt;p&gt;Key points for understanding this diagram:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Before changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Ensure no long transactions on the table — long transactions hold locks on tables persistently. Long transactions are a well-known hazard in PG; handle them first.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Ensure no autovacuum (to prevent wraparound) on the table — autovacuum generally doesn&amp;rsquo;t block SQL, except when running &lt;a href="https://www.postgresql.org/docs/18/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;to prevent wraparound&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Autovacuum workers generally don&amp;rsquo;t block other commands. If a process attempts to acquire a lock that conflicts with the &lt;code&gt;SHARE UPDATE EXCLUSIVE&lt;/code&gt; lock held by autovacuum, lock acquisition will interrupt the autovacuum. However, if the autovacuum is running to prevent transaction ID wraparound (i.e., the autovacuum query name in the &lt;code&gt;pg_stat_activity&lt;/code&gt; view ends with &lt;code&gt;(to prevent wraparound)&lt;/code&gt;), the autovacuum is not automatically interrupted.&lt;/p&gt;
&lt;/blockquote&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;lock_timeout=2000&lt;/code&gt; — if a lock cannot be acquired within 2 seconds, bail out to avoid mass blocking.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Special cases for &amp;ldquo;small-to-large&amp;rdquo; type changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small-to-large type changes generally don&amp;rsquo;t rewrite the table, but there are exceptions. Pay special attention to &lt;code&gt;int → bigint&lt;/code&gt; (common for PK columns) and &lt;code&gt;char(n) → char(m)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Partitioned table indexes. Small-to-large type changes on partitioned tables don&amp;rsquo;t rewrite the table, but they &lt;strong&gt;do rebuild indexes&lt;/strong&gt; — and rebuilding indexes on partitioned tables is typically very slow, potentially causing prolonged level-8 lock blocking. This behavior is unique to partitioned tables.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Changing column types:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Almost always rewrites the table, except for equivalent types or small-to-large cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;DDL lock-level reduction tips:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use CIC (CREATE INDEX CONCURRENTLY) for indexes. If partitions don&amp;rsquo;t support it, do CIC on child tables (remember to attach the index).&lt;/li&gt;
&lt;li&gt;CIC has multiple phases. Phases 2 and 3 acquire a SHARE lock, blocking DML. (Official docs only mention SHARE UPDATE EXCLUSIVE — CIC isn&amp;rsquo;t a simple explicit lock.)&lt;/li&gt;
&lt;li&gt;Add primary keys with &lt;code&gt;USING INDEX&lt;/code&gt;. For partitions, leverage &amp;ldquo;add PK on child table + add PK on parent can merge existing child PKs.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;VALIDATE CONSTRAINT&lt;/code&gt; for constraints.&lt;/li&gt;
&lt;li&gt;PG &amp;lt;17 doesn&amp;rsquo;t support &lt;code&gt;NOT NULL VALIDATE&lt;/code&gt;. Use &lt;code&gt;CHECK(col1 IS NOT NULL)&lt;/code&gt; instead. This CHECK-to-NOT-NULL conversion won&amp;rsquo;t produce extra scans.&lt;/li&gt;
&lt;li&gt;Adding a column with a volatile DEFAULT rewrites the table. Use the non-volatile-no-rewrite property: add the column first (no rewrite), then UPDATE legacy data as needed.&lt;/li&gt;
&lt;li&gt;When attaching partitions, use CHECK constraints to reduce downtime, and use &lt;code&gt;VALIDATE CONSTRAINT&lt;/code&gt; for the CHECK.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CREATE TABLE LIKE&lt;/code&gt; + &lt;code&gt;ATTACH&lt;/code&gt; has much lower lock levels than &lt;code&gt;PARTITION OF&lt;/code&gt; (though I still prefer &lt;code&gt;PARTITION OF&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;After changes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Remember to collect statistics (needed in many scenarios).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Parallel Index Creation
 &lt;div id="parallel-index-creation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#parallel-index-creation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In production, you may need to create indexes on very large tables that take a long time. Parallel index creation can shorten build time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parallel index creation on regular tables:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Parallel parameter: &lt;code&gt;max_parallel_maintenance_workers&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Prerequisites:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enough workers: check &lt;code&gt;max_parallel_workers&lt;/code&gt;, &lt;code&gt;max_worker_processes&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Increase &lt;code&gt;maintenance_work_mem&lt;/code&gt; to GB scale&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Effective for B-tree and BRIN&lt;/li&gt;
&lt;li&gt;&lt;code&gt;maintenance_work_mem&lt;/code&gt; limits the entire utility command. Unlike parallel query, where resource limits are per worker process.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From test results, parallel index creation shows diminishing returns beyond 8 workers (this conclusion may not hold in all environments).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parallel index creation on partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Recommend manual parallel creation across child partitions — run index creation on multiple partitions simultaneously rather than using native parallelism. This reduces multi-process coordination overhead.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Cached Plan Must Not Change Resource
 &lt;div id="cached-plan-must-not-change-resource" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cached-plan-must-not-change-resource" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After adding a new column the previous night, application connections started throwing errors the next morning: &amp;ldquo;cached plan must not change result type in PostgreSQL&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Reproduction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; a(b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; p1 (varchar) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLUMN&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;TYPE&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; p1 (&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;A000: cached plan must &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; change &lt;span style="color:#66d9ef"&gt;result&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: RevalidateCachedQuery, plancache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;718&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Test environment solutions:&lt;/strong&gt;
&lt;code&gt;DEALLOCATE ALL&lt;/code&gt; — actively discard prepared statements
Or,
&lt;code&gt;DISCARD ALL&lt;/code&gt; — actively discard all session state&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DEALLOCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALL&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--DISCARD ALL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; p1 (varchar) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; p1 (&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Production environment solutions:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since the error occurs at the application layer, JDBC can handle &lt;code&gt;DEALLOCATE ALL&lt;/code&gt; / &lt;code&gt;DISCARD ALL&lt;/code&gt;, but the application may not have implemented this. Immediate production solutions:&lt;/p&gt;
&lt;p&gt;Solutions (choose one):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Since connection pools like HikariCP have connection cycling and timeout mechanisms, killing idle sessions will gradually reduce errors.&lt;/li&gt;
&lt;li&gt;Similarly, due to connection pool cycling, you can do nothing — as the pool gradually establishes new connections, the errors fade.&lt;/li&gt;
&lt;li&gt;If business pressure is high enough, consider killing all application connections.&lt;/li&gt;
&lt;li&gt;Rolling restart of the application.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Not&lt;/strong&gt; recommended:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Restart the application after every DDL.&amp;rdquo; It works but don&amp;rsquo;t recommend this as a standard practice.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autosave=conservative&lt;/code&gt;. It works but enables subtransactions. A savepoint is set for each query; rollback happens only for rare cases like &amp;lsquo;cached statement cannot change return type&amp;rsquo; or &amp;lsquo;statement XXX is not valid,&amp;rsquo; where the JDBC driver rolls back and retries.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;JDBC configuration suggestions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Configure automatic retry after transaction rollback: &lt;a href="https://developer.aliyun.com/article/741750" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/741750&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Other JDBC config references: &lt;a href="https://jdbc.postgresql.org/documentation/server-prepare/#corner-cases" target="_blank" rel="noreferrer"&gt;https://jdbc.postgresql.org/documentation/server-prepare/#corner-cases&lt;/a&gt;. Note: some suggestions are not suitable for production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Physical Replication
 &lt;div id="physical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Query Conflicts
 &lt;div id="query-conflicts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#query-conflicts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Query conflicts are a notoriously frustrating feature that directly impacts the usability of PG standby queries. Query conflicts increase standby lag, yet long-running queries on the standby are logically reasonable. This forces PG administrators to balance between lag management and long-query management — a problem that doesn&amp;rsquo;t exist in other relational databases.&lt;/p&gt;
&lt;p&gt;Hidden characteristics of query conflicts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Even static tables can trigger query conflicts (&lt;a href="https://www.modb.pro/db/1966415366276526080" target="_blank" rel="noreferrer"&gt;see: From Static Table Query Conflicts to Their Principles&lt;/a&gt;). The conflict is a snapshot conflict, largely unrelated to table-level locks — snapshot conflicts are cross-table.&lt;/li&gt;
&lt;li&gt;Long queries affect short queries. Once a long query pushes standby lag to &lt;code&gt;max_standby_streaming_delay&lt;/code&gt;, even short queries get canceled.&lt;/li&gt;
&lt;li&gt;Continuous short queries also cause query conflicts. For example, one short query hasn&amp;rsquo;t finished when the next starts — the two queries may be logically similar, and the startup process hasn&amp;rsquo;t had time to apply WAL. Both short queries hold the XID that needs to be applied. Check whether &lt;code&gt;pg_stat_activity.backend_xmin&lt;/code&gt; is less than the XID the startup process is applying.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Recommended standby query practices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Using RTO SLO to tune &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; is a good approach. When arguments lead nowhere, SLO-based IT management saves the day.&lt;/li&gt;
&lt;li&gt;Separate short/fast business queries from long queries (data extraction, reporting) onto different standbys to reduce mutual interference.&lt;/li&gt;
&lt;li&gt;Standby queries still need SQL optimization.&lt;/li&gt;
&lt;li&gt;Standby WAL apply lag must be monitored.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Logical Replication
 &lt;div id="logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Logical replication has countless pitfalls. 2024 had many nasty cases; 2025 had some too, but less severe, mostly on older PG versions. Overall, logical replication on newer PG versions is trending toward stability.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Slow DDL/DCL Parsing on Older PG Versions
 &lt;div id="slow-ddldcl-parsing-on-older-pg-versions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slow-ddldcl-parsing-on-older-pg-versions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1922232196358746112" target="_blank" rel="noreferrer"&gt;Case Study: GRANT and Walsender Stuck&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;On PG 13 and earlier, certain DDL/DCL statements parse slowly and may affect walsender lag. These include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Batch GRANT (including grant all tables) + pathman extension installed (whether used or not)&lt;/li&gt;
&lt;li&gt;Batch DDL/TRUNCATE/DCL/DROP PUBLICATION&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Older PG + Multiple Replication Links + Flink
 &lt;div id="older-pg--multiple-replication-links--flink" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#older-pg--multiple-replication-links--flink" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Flink requires one link per table. Since PostgreSQL walsenders re-decode independently, dozens of Flink links on one PG database are common — and hard to refactor.&lt;/p&gt;
&lt;p&gt;On PG 11 and earlier, the walsender main loop calls &lt;code&gt;PostmasterIsAlive()&lt;/code&gt;, causing poor loop performance. Starting from PG 12, &lt;code&gt;WalSndLoop&lt;/code&gt; no longer polls &lt;code&gt;PostmasterIsAlive()&lt;/code&gt; in the main loop; instead, status checks are placed inside &lt;code&gt;WalSndWait&lt;/code&gt;, using event-based passive notification. This greatly reduces CPU contention.&lt;/p&gt;
&lt;p&gt;If you have multiple Flink links on an older PG version, upgrading can alleviate certain walsender resource contention issues, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;May resolve the problem where walsender startup resource contention prevents the database from coming up for a long time&lt;/li&gt;
&lt;li&gt;May resolve upstream heavy data changes (including DDL rewrites) causing runtime walsender log decoding CPU saturation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Older PG Cannot Auto-Sync New Partitions
 &lt;div id="older-pg-cannot-auto-sync-new-partitions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#older-pg-cannot-auto-sync-new-partitions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;On older PG versions with declarative partitioning, note that you can &lt;strong&gt;only&lt;/strong&gt; publish child tables individually. &lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;PG ≥13 supports publishing by parent table&lt;/a&gt;. Below that, you must configure sync per partition child table name:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Allow partitioned tables to be logically replicated via &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;publications&lt;/a&gt; (Amit Langote) &lt;a href="https://postgr.es/c/17b9e7f9f" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt; &lt;a href="https://postgr.es/c/83fd4532a" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Previously, partitions had to be replicated individually. Now a partitioned table can be published explicitly, causing all its partitions to be published automatically. Addition/removal of a partition causes it to be likewise added to or removed from the publication. The &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;&lt;code&gt;CREATE PUBLICATION&lt;/code&gt;&lt;/a&gt; option &lt;code&gt;publish_via_partition_root&lt;/code&gt; controls whether changes to partitions are published as their own changes or their parent&amp;rsquo;s.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In other words, if this partitioned table is an upstream for sync, every time a new partition is added, you must adapt the sync tool to publish it — otherwise, new partition data won&amp;rsquo;t sync.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Migration and Upgrades
 &lt;div id="migration-and-upgrades" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#migration-and-upgrades" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Xinchuang Migration and glibc Upgrades
 &lt;div id="xinchuang-migration-and-glibc-upgrades" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xinchuang-migration-and-glibc-upgrades" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Whether it&amp;rsquo;s Xinchuang (domestic tech migration) or Linux OS version upgrades, glibc upgrades may be involved — and glibc upgrades can be extremely painful. PG sorting was entirely OS-dependent before PG 17.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL cannot detect compatibility issues from glibc upgrades.&lt;/strong&gt; Every minor version of GNU C library makes locale changes. The most problematic version in practice is &lt;strong&gt;glibc 2.28&lt;/strong&gt;, because 2.28 upgraded to a major &lt;strong&gt;Unicode 9.0.0&lt;/strong&gt; release (&lt;a href="https://sourceware.org/glibc/wiki/Release/2.28" target="_blank" rel="noreferrer"&gt;has been updated to a new upstream version from ISO which is in sync with Unicode 9.0.0&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Collations come in many types, and many environments use linguistic sorting (e.g., &lt;code&gt;en_US.utf8&lt;/code&gt;), which is the most version-sensitive. Collation changes most commonly cause database crashes during index scans, but also uncommon issues like duplicate primary keys, data landing in wrong partitions, inconsistent merge join results, etc.&lt;/p&gt;
&lt;p&gt;Fortunately, PG 17 provides a very safe locale provider: &lt;code&gt;builtin&lt;/code&gt;, no longer dependent on OS-provided glibc, ICU, etc. Example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb --locale-provider&lt;span style="color:#f92672"&gt;=&lt;/span&gt;builtin --bultin-locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C.UTF-8 dbname1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However,&lt;/p&gt;
&lt;p&gt;&lt;code&gt;builtin&lt;/code&gt; is great but arrived too late. Converting existing production instances to &lt;code&gt;builtin&lt;/code&gt; collation is no small task. Moreover, Xinchuang migrations or OS upgrades may not mandate database upgrades.&lt;/p&gt;
&lt;p&gt;During Xinchuang migration, the target host&amp;rsquo;s glibc version is typically higher than the old Intel server&amp;rsquo;s — likely crossing version 2.28. Combined with tight deadlines, KPI pressure, staffing shortages, and large databases, physical migration is unavoidable. So physical Xinchuang migration must account for glibc version and collation-induced anomalies.&lt;/p&gt;
&lt;p&gt;What can you do after physical migration?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Official required steps&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check indexes, rebuild those clearly problematic&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REFRESH DATABASE COLLATION VERSION&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Check dependent objects&lt;/li&gt;
&lt;li&gt;&lt;code&gt;REFRESH COLLATION VERSION&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;II. Unofficial &amp;ldquo;dark arts&amp;rdquo; approaches&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t have a complete solution, just ideas:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Handle partitioned table data landing in wrong partitions&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partition key is int/bigint/float: unrelated to collation, don&amp;rsquo;t worry&lt;/li&gt;
&lt;li&gt;Partition key is timestamp: don&amp;rsquo;t worry; if varchar or other character types: evaluate&lt;/li&gt;
&lt;li&gt;Partition key is character type: refer to &amp;ldquo;a&amp;rdquo; vs &amp;ldquo;-&amp;rdquo; sort order (pgconf Collation Challenges Sorting It Out). But note:
&lt;ul&gt;
&lt;li&gt;If querying data, don&amp;rsquo;t query from the parent table — may crash or return nothing&lt;/li&gt;
&lt;li&gt;No simple detection method&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Handle primary key / unique key conflicts&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Handle FDW sort range anomalies&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unknown issues&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Reference: &lt;a href="https://docs.paic.com.cn/#/post/122695260" target="_blank" rel="noreferrer"&gt;collation&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Smooth Major Version Upgrades
 &lt;div id="smooth-major-version-upgrades" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#smooth-major-version-upgrades" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0077_zero_downtime_major_upgrade.md?ref_type=heads" target="_blank" rel="noreferrer"&gt;https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0077_zero_downtime_major_upgrade.md?ref_type=heads&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf" target="_blank" rel="noreferrer"&gt;https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Common major version upgrade approaches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pg_upgrade&lt;/code&gt; in-place upgrade. Not recommended — may blow up in place.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pg_dump&lt;/code&gt;: suitable for small databases, longer maintenance windows.&lt;/li&gt;
&lt;li&gt;Logical sync + switchover (pub/sub, pg_logical, DTS, etc.): suitable for small databases, shorter windows.&lt;/li&gt;
&lt;li&gt;Physical forward sync + logical reverse sync: suitable for large databases, not-too-short windows.&lt;/li&gt;
&lt;li&gt;Physical replication full sync + logical incremental sync + switchover: suitable for large databases, extremely short windows.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syncing full data via logical replication can be extremely slow. In-place upgrade of a new standby carries uncertainty and upgrade time, plus the need for reverse logical sync. &amp;ldquo;Smooth major version upgrade&amp;rdquo; is essentially &amp;ldquo;physical replication full sync + logical incremental sync + switchover.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Key technique: the primary creates a slot and returns an LSN. The new standby uses &lt;code&gt;recovery_target_lsn&lt;/code&gt; to recover to that LSN, then logical sync begins.&lt;/p&gt;
&lt;p&gt;Approximate workflow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Pre-checks. Multi-database (consider applying one slot LSN for all), extensions, pathman, triggers, foreign keys, unlogged tables, crontab, etc.&lt;/li&gt;
&lt;li&gt;Physical sync. Old and new version software, compare and backup conf files, &lt;code&gt;pg_basebackup&lt;/code&gt; to build new standby on old version.&lt;/li&gt;
&lt;li&gt;Logical sync prep 1. Primary keys and replica identity, create publication; prohibit application DDL/DCL.&lt;/li&gt;
&lt;li&gt;Restore new standby to target LSN. Stop new standby; create slot on old primary and record LSN; start new standby with target LSN.&lt;/li&gt;
&lt;li&gt;New standby major version upgrade. Upgrade, handle issues, switch environment variables.&lt;/li&gt;
&lt;li&gt;Logical sync prep 2. Disable triggers, foreign keys, jobs, extensions, etc.&lt;/li&gt;
&lt;li&gt;Logical sync. Create subscription with specified slot, &lt;code&gt;copy_data=false&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Post logical sync. Check for index corruption, check logs for errors and fix, rebuild remote standbys.&lt;/li&gt;
&lt;li&gt;Switchover. Stop application; advance sequences, enable foreign keys, triggers, jobs, etc.&lt;/li&gt;
&lt;li&gt;Switchover. Build reverse link (old primary subscribes).&lt;/li&gt;
&lt;li&gt;Switchover. Application cutover.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The smooth major upgrade approach is smooth for the business but complex for the DBA. It combines all the drawbacks of logical and physical migration — quite painful to execute. The steps above are already simplified. This approach consumes DBA manpower; consider it only for the most critical databases.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Partitioned Table Management
 &lt;div id="partitioned-table-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partitioned-table-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL partitioned tables are very flexible, lack built-in interval partitioning, and have varied behavior across versions — making partition management problems an annual occurrence. I believe many PG DBAs still worry about new partition issues.&lt;/p&gt;
&lt;p&gt;My observations on partition management and usage issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Not using declarative partitioning.&lt;/strong&gt; Older versions still use pathman partitioning or inheritance-based partitioning, or continue using them even after upgrading. Declarative partitioning was introduced in PG 10. Due to early version limitations, recommend &lt;strong&gt;only&lt;/strong&gt; using declarative partitioning from at least PG 12 onward to reduce environmental complexity.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developers building child table indexes/primary keys directly.&lt;/strong&gt; Creating indexes/PKs directly on child tables via SQL rather than through parent table inheritance means the next developer writing SQL may forget. This leads not only to parent-child inconsistency but also child-child inconsistency, eventually making the partition structure unrecognizable.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No new partition management strategy.&lt;/strong&gt; Forgetting to create new partitions or using a DEFAULT partition. Typically, developers create partitions for a few years ahead; next time, the developers may have moved on, and no one manages new partition creation. This is a ticking time bomb, or data lands in the DEFAULT partition, defeating the purpose of partitioning.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Lack of DBA management.&lt;/strong&gt; Yes, DBA! PG partitioned table knowledge is extensive (see &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;). How to build management strategies and implement them in your environment requires proactive DBA involvement. This may be the most important factor.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My partition management goals (from &lt;a href="https://www.modb.pro/db/2007743085057499136" target="_blank" rel="noreferrer"&gt;Case Study: 2026-01-01 Partition Data Update Failure&lt;/a&gt;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the parent table structure as the canonical structure — the parent table faces developers; it should have primary keys, indexes, and replica identity (unless the PG version doesn&amp;rsquo;t support it).&lt;/li&gt;
&lt;li&gt;Keep parent and child tables consistent. Use &lt;code&gt;PARTITION OF&lt;/code&gt; when creating new partitions (yes, I don&amp;rsquo;t recommend ATTACH).&lt;/li&gt;
&lt;li&gt;Keep child tables consistent with each other.&lt;/li&gt;
&lt;li&gt;Create new partitions in advance. Partition data volume should not be too large.&lt;/li&gt;
&lt;li&gt;DEFAULT partitions are not recommended. If created, must monitor writes to them.&lt;/li&gt;
&lt;li&gt;Queries on frequently accessed tables must include the partition key for partition pruning. Otherwise, convert to a regular table.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Observability
 &lt;div id="observability" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observability" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/18/monitoring-stats.html" target="_blank" rel="noreferrer"&gt;official documentation&lt;/a&gt; clearly explains database, table, index, SQL, flush, and other metrics.&lt;/p&gt;
&lt;p&gt;A few metrics deserve special attention — not only are they unclearly explained, but they&amp;rsquo;re frequently used and have a learning curve.&lt;/p&gt;

&lt;h3 class="relative group"&gt;buffers_alloc, blks_read
 &lt;div id="buffers_alloc-blks_read" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buffers_alloc-blks_read" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pg_stat_bgwriter.buffers_alloc&lt;/code&gt;: Number of buffers allocated — shared memory eviction volume.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pg_stat_database.blks_read&lt;/code&gt;: OS cache reads.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;(&lt;code&gt;buffers_alloc&lt;/code&gt; may appear in different views across PG versions, but the meaning is the same.)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_stat_bgwriter.buffers_alloc&lt;/code&gt; is the shared memory buffer allocation count (called buffer allocation in the source). It represents shared memory eviction volume — newly started databases typically have higher values. When observing shared memory busyness, buffer allocation may be better than hit ratio — high hit ratios can be inflated by frequent small-table access, while allocation represents actual eviction.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;buffers_alloc&lt;/code&gt; counts buffers allocated after reading from cache and loading into a new shared buffer — somewhat representative of OS cache reads too? But in practice, &lt;code&gt;buffers_alloc&lt;/code&gt; and &lt;code&gt;blks_read&lt;/code&gt; have similar meanings yet can differ significantly in value. Why? Unclear, pending research.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;numBufferAllocs&lt;/code&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;tup_fetched, tup_returned
 &lt;div id="tup_fetched-tup_returned" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tup_fetched-tup_returned" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;These are metrics in &lt;code&gt;pg_stat_database&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tup_fetched&lt;/code&gt;: Number of rows ultimately returned from index scans, after removing filtered rows, dead tuples, and invisible rows. Result-oriented.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;tup_returned&lt;/code&gt;: Number of rows fetched from the table during index scans, regardless of filter conditions, dead tuples, or visibility. Process-oriented.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Thus, &lt;code&gt;tup_returned&lt;/code&gt; is typically much higher than &lt;code&gt;tup_fetched&lt;/code&gt;. An abnormally high &lt;code&gt;tup_returned&lt;/code&gt; suggests optimization opportunity — after all, many rows were accessed but few returned to the client.&lt;/p&gt;

&lt;h3 class="relative group"&gt;idx_tup_fetch, idx_tup_read
 &lt;div id="idx_tup_fetch-idx_tup_read" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#idx_tup_fetch-idx_tup_read" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;These are metrics in &lt;code&gt;pg_stat_all_indexes&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;idx_tup_read&lt;/code&gt;: Number of index entries accessed (counted from the index side), includes bitmap scans.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idx_tup_fetch&lt;/code&gt;: Number of rows ultimately returned from index scans (counted from the table side), excludes bitmap scans.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Madness.&lt;/p&gt;
&lt;p&gt;One thing to remember: &lt;strong&gt;&lt;code&gt;xx_tup_fetch&lt;/code&gt;&lt;/strong&gt; refers to the final rows returned after index access + table fetch — result-oriented.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos" target="_blank" rel="noreferrer"&gt;postgres-ai howtos&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgresql.us/events/pgconfnyc2024/sessions/session/1862/slides/172/pgvector_best_practices_pgconfnyc2024.pdf" target="_blank" rel="noreferrer"&gt;Best practices for using pgvector&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/2007743085057499136" target="_blank" rel="noreferrer"&gt;Case Study: 2026-01-01 Partition Data Update Failure&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1976119963471589376" target="_blank" rel="noreferrer"&gt;Case Study: From Inaccurate DISTINCT to DISTINCT Calculation Principles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1964312913808732160" target="_blank" rel="noreferrer"&gt;Case Study: Adding an Index Causes Performance Degradation and Generic Plans&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1966415366276526080" target="_blank" rel="noreferrer"&gt;From Static Table Query Conflicts to Their Principles&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/1948643346948304896" target="_blank" rel="noreferrer"&gt;Control File Parameters and Primary-Standby Parameter Mismatch&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://liuzhilong.blog.csdn.net/article/details/130783036" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/130783036&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/blog/adforpostgresql/improving-postgres-connection-scalability-snapshots/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/sql-prepare.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/17/sql-prepare.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/sql-deallocate.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/17/sql-deallocate.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/13.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jdbc.postgresql.org/documentation/use/" target="_blank" rel="noreferrer"&gt;https://jdbc.postgresql.org/documentation/use/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jdbc.postgresql.org/documentation/server-prepare/#server-prepared-statements" target="_blank" rel="noreferrer"&gt;https://jdbc.postgresql.org/documentation/server-prepare/#server-prepared-statements&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf" target="_blank" rel="noreferrer"&gt;https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4791/slides/439/2023.pgconf.eu%20Zero%20Downtime%20PostgreSQL%20Upgrades.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Thanks to Master Gao for the 2025 battles.&lt;/p&gt;</content:encoded></item><item><title>Case: Partition Data UPDATE Failure on 2026-01-01</title><link>https://lastdba.com/en/2026/01/04/case-partition-data-update-failure-on-2026-01-01/</link><pubDate>Sun, 04 Jan 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/01/04/case-partition-data-update-failure-on-2026-01-01/</guid><description>&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;On December 30, business errors were reported — data could not be updated:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; cannot update table &lt;span style="color:#e6db74"&gt;&amp;#34;tablzl_202601&amp;#34;&lt;/span&gt; because it does not have a replica identity and publishes updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: CheckCmdReplicaIdentity, execReplication.c:&lt;span style="color:#ae81ff"&gt;575&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Temporary Recovery
 &lt;div id="temporary-recovery" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#temporary-recovery" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The error message was clear: no replica identity. The table was a partitioned table and a 2026 partition, so I immediately suspected the new partition lacked a primary key. (A new table&amp;rsquo;s replica identity defaults to &lt;code&gt;default&lt;/code&gt;, which only uses a primary key as the replica identity. Without a primary key, updates are impossible.)&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;On December 30, business errors were reported — data could not be updated:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; cannot update table &lt;span style="color:#e6db74"&gt;&amp;#34;tablzl_202601&amp;#34;&lt;/span&gt; because it does not have a replica identity and publishes updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: CheckCmdReplicaIdentity, execReplication.c:&lt;span style="color:#ae81ff"&gt;575&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Temporary Recovery
 &lt;div id="temporary-recovery" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#temporary-recovery" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The error message was clear: no replica identity. The table was a partitioned table and a 2026 partition, so I immediately suspected the new partition lacked a primary key. (A new table&amp;rsquo;s replica identity defaults to &lt;code&gt;default&lt;/code&gt;, which only uses a primary key as the replica identity. Without a primary key, updates are impossible.)&lt;/p&gt;
&lt;p&gt;Further investigation revealed: the parent table had no primary key or indexes, child partitions from 2025 and earlier had both primary keys and indexes, but 2026 and later child partitions had neither — and all child partitions were published. Roughly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_parent &lt;span style="color:#75715e"&gt;-- no PK, no indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202511 &lt;span style="color:#75715e"&gt;-- has PK, has indexes, published
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202512 &lt;span style="color:#75715e"&gt;-- has PK, has indexes, published
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202601 &lt;span style="color:#75715e"&gt;-- no PK, no indexes, published
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;p_child_202602 &lt;span style="color:#75715e"&gt;-- no PK, no indexes, published&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since the parent table had nothing, a &lt;code&gt;partition of&lt;/code&gt; child would also have nothing — you must manually create the primary key and indexes for each child partition. So the new partition creation was problematic; the old partitions presumably had them added after creation.&lt;/p&gt;
&lt;p&gt;Additionally, publishing partitioned tables via the parent was &lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;only supported starting from PG13&lt;/a&gt;. Previously, you couldn&amp;rsquo;t publish via the parent — only via child tables. This database was on PG11.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Allow partitioned tables to be logically replicated via &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;publications&lt;/a&gt; (Amit Langote) &lt;a href="https://postgr.es/c/17b9e7f9f" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt; &lt;a href="https://postgr.es/c/83fd4532a" target="_blank" rel="noreferrer"&gt;§&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Previously, partitions had to be replicated individually. Now a partitioned table can be published explicitly, causing all its partitions to be published automatically. Addition/removal of a partition causes it to be likewise added to or removed from the publication. The &lt;a href="https://www.postgresql.org/docs/13/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;&lt;code&gt;CREATE PUBLICATION&lt;/code&gt;&lt;/a&gt; option &lt;code&gt;publish_via_partition_root&lt;/code&gt; controls whether changes to partitions are published as their own changes or their parent&amp;rsquo;s.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;After the initial diagnosis and given the urgency, there were three ways to temporarily resolve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add primary keys to the 2026 partitions&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;replica identity full&lt;/code&gt; on the 2026 partitions&lt;/li&gt;
&lt;li&gt;Remove the 2026 partitions from the publication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since recovery time was about the same for all options, we chose adding primary keys — the lowest operational cost — to at least stop the business errors.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Root Cause Analysis
 &lt;div id="root-cause-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The issue seems clear: &amp;ldquo;no replica identity + published + no primary key&amp;rdquo; prevents updates. But several questions still needed answers.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Question 1: Why does the UPDATE fail even though there&amp;rsquo;s no 202601 data at all (the new partition has zero rows)?
 &lt;div id="question-1-why-does-the-update-fail-even-though-theres-no-202601-data-at-all-the-new-partition-has-zero-rows" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#question-1-why-does-the-update-fail-even-though-theres-no-202601-data-at-all-the-new-partition-has-zero-rows" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The SQL text was:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; tablzl_202601
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; idid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_updated &lt;span style="color:#f92672"&gt;=&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; mykey &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The partition key for &lt;code&gt;tablzl_202601&lt;/code&gt; is &lt;code&gt;created_date&lt;/code&gt;. The SQL WHERE clause didn&amp;rsquo;t include the partition key, so when attempting to update the 202601 partition, it found no primary key and errored out.&lt;/p&gt;
&lt;p&gt;As for whether row existence or replica identity is checked first, we can see from &lt;code&gt;ExecSimpleRelationUpdate&lt;/code&gt;. This function has changed very little across PG versions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Find the searchslot tuple and update it with data in the slot,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * update the indexes, and execute any constraints and per-row triggers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Caller is responsible for opening the indexes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExecSimpleRelationUpdate&lt;/span&gt;(EState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;estate, EPQState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;epqstate,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 TupleTableSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;searchslot, TupleTableSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;CheckCmdReplicaIdentity&lt;/span&gt;(rel, CMD_UPDATE); &lt;span style="color:#75715e"&gt;// check replica identity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* BEFORE ROW UPDATE Triggers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_TrigDesc &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_TrigDesc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;trig_update_before_row)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		slot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ExecBRUpdateTriggers&lt;/span&gt;(estate, epqstate, resultRelInfo,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;searchslot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									NULL, slot);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (slot &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)		&lt;span style="color:#75715e"&gt;/* &amp;#34;do nothing&amp;#34; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			skip_tuple &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;skip_tuple)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;recheckIndexes &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NIL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Check the constraints of the tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_att&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;constr)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExecConstraints&lt;/span&gt;(resultRelInfo, slot, estate);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_PartitionCheck)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExecPartitionCheck&lt;/span&gt;(resultRelInfo, slot, estate, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Materialize slot into a tuple that we can scribble upon. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		tuple &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ExecMaterializeSlot&lt;/span&gt;(slot);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* OK, update the tuple and index entries for it */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;simple_heap_update&lt;/span&gt;(rel, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;searchslot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (resultRelInfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ri_NumIndices &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleIsHeapOnly&lt;/span&gt;(slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			recheckIndexes &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ExecInsertIndexTuples&lt;/span&gt;(slot, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 estate, false, NULL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 NIL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* AFTER ROW UPDATE Triggers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ExecARUpdateTriggers&lt;/span&gt;(estate, resultRelInfo,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;searchslot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tts_tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 NULL, tuple, recheckIndexes, NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;list_free&lt;/span&gt;(recheckIndexes);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;ExecSimpleRelationUpdate&lt;/code&gt; flow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check replica identity&lt;/li&gt;
&lt;li&gt;BEFORE ROW UPDATE triggers&lt;/li&gt;
&lt;li&gt;Check constraints (both non-partition and partition constraints)&lt;/li&gt;
&lt;li&gt;Update the row&lt;/li&gt;
&lt;li&gt;Insert index entries&lt;/li&gt;
&lt;li&gt;AFTER ROW UPDATE triggers&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;So PG&amp;rsquo;s logic checks replica identity first, before row updates and everything else.&lt;/p&gt;
&lt;p&gt;Even though the SQL didn&amp;rsquo;t include the partition key, would adding it trigger partition pruning? The answer is: maybe not.&lt;/p&gt;
&lt;p&gt;Partition pruning improvements across versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG10 introduced declarative partitioning. There was no &lt;code&gt;enable_partition_pruning&lt;/code&gt; parameter; pruning was done at planning time via &lt;code&gt;constraint_exclusion&lt;/code&gt;. So PG10 had no query-execution-time pruning.&lt;/li&gt;
&lt;li&gt;PG11 added runtime partition pruning: &lt;a href="https://www.postgresql.org/docs/release/11.0/" target="_blank" rel="noreferrer"&gt;Allow partition elimination during query execution (David Rowley, Beena Emerson)&lt;/a&gt;. But it only supports pruning with bound variables, not non-immutable functions (including &lt;code&gt;now()&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/release/14.0/" target="_blank" rel="noreferrer"&gt;PG14&lt;/a&gt; added final pruning: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c5b7ba4e6" target="_blank" rel="noreferrer"&gt;This wins in UPDATEs on partitioned tables when only some of the partitions will actually receive updates&lt;/a&gt;. i.e., supports pruning with non-immutable functions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since PG11 doesn&amp;rsquo;t support &lt;code&gt;now()&lt;/code&gt; pruning, adding a &lt;code&gt;now()&lt;/code&gt; condition to the business SQL wouldn&amp;rsquo;t trigger pruning — the error would still occur. However, if the business passed a bound variable, pruning would trigger and the error wouldn&amp;rsquo;t appear. Note: &amp;ldquo;the error wouldn&amp;rsquo;t appear&amp;rdquo; means updating 202512 data wouldn&amp;rsquo;t error out on the 202601 partition; updating 202601 data would still fail regardless.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Question 2: The partition was created on 2025-12-26, so why was the problem only discovered on December 30?
 &lt;div id="question-2-the-partition-was-created-on-2025-12-26-so-why-was-the-problem-only-discovered-on-december-30" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#question-2-the-partition-was-created-on-2025-12-26-so-why-was-the-problem-only-discovered-on-december-30" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;This is even simpler: &amp;ldquo;no replica identity + published + no primary key&amp;rdquo; is an AND condition.&lt;/p&gt;
&lt;p&gt;Although the new partitions were created early, they were published on the evening of December 29 at 20:47:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat postgresql&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;.csv.bak &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#e6db74"&gt;&amp;#34;alter publication&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;730&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;userlzlreplication&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,xxx&lt;span style="color:#e6db74"&gt;&amp;#34;statement: alter publication publzl add table &amp;#34;&amp;#34;public&amp;#34;&amp;#34;.&amp;#34;&amp;#34;tablzl_202601&amp;#34;&amp;#34;, &amp;#34;&amp;#34;public&amp;#34;&amp;#34;.&amp;#34;&amp;#34;tablzl_202602&amp;#34;&amp;#34;,...&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The first error appeared on December 29 at 22:26, about 1.5 hours later:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cat postgresql&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;.csv.bak &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#e6db74"&gt;&amp;#34;REPLICA IDENTITY&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;404&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;userlzlreplication&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;375121&lt;/span&gt;,xxx,&lt;span style="color:#e6db74"&gt;&amp;#34;cannot update table &amp;#34;&amp;#34;tablzl_202601&amp;#34;&amp;#34; because it does not have a replica identity and publishes updates&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.&amp;#34;&lt;/span&gt;,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE tablzl&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Root cause overview: The parent table had no primary key, so &lt;code&gt;partition of&lt;/code&gt; child partitions naturally also had none. Old child partitions had their primary keys added manually; new child partitions did not, resulting in the 202601 partition lacking a primary key. Logical replication relies on the primary key (default replica identity) for synchronization. Without replica identity, changes can&amp;rsquo;t be sent downstream, and UPDATE/DELETE statements on published tables cannot execute. In PG11, an UPDATE SQL that &lt;em&gt;does&lt;/em&gt; include the partition key condition may &lt;em&gt;still&lt;/em&gt; visit the new partition.&lt;/p&gt;
&lt;p&gt;A stroke of luck: Due to various factors, this problem was discovered early in this particular database. We had a one-day buffer on December 31 to fix all database instances, ensuring at least that January 1 new partition data updates wouldn&amp;rsquo;t error out. Otherwise, on January 1, 2026, multiple systems would have likely gone up in flames.&lt;/p&gt;
&lt;p&gt;Temporary measures (pick one):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add primary keys to 2026 partitions&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;replica identity full&lt;/code&gt; on 2026 partitions&lt;/li&gt;
&lt;li&gt;Remove 2026 partitions from the publication&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For replication pipeline optimization:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tables without primary keys should be detected proactively, otherwise publishing them could cause business-side UPDATE failures&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For partition management strategy:&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s partitioned tables are highly flexible, and developers generally don&amp;rsquo;t know how to create partitions correctly. Combined with significant new partitioning features across roughly PG10-15, and the lack of INTERVAL partitioning in PG, partitioned tables can end up a mess. Standardized management of partitioned tables is thus critical. For partition table features and operational tips, see: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;As for management tools, I&amp;rsquo;ll skip those.&lt;/p&gt;
&lt;p&gt;Management goals:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use the parent table structure as the standard: the parent table, being developer-facing, should have primary keys, indexes, and replica identity (unless the PG version doesn&amp;rsquo;t support it)&lt;/li&gt;
&lt;li&gt;Keep parent and child tables consistent; use &lt;code&gt;partition of&lt;/code&gt; to create new partitions (yes, I don&amp;rsquo;t recommend &lt;code&gt;attach&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Keep child tables consistent with each other&lt;/li&gt;
&lt;li&gt;Create new partitions in advance; partition data volumes should not be excessive&lt;/li&gt;
&lt;li&gt;Default partitions are not recommended; if created, their writes must be monitored&lt;/li&gt;
&lt;li&gt;Frequently accessed tables must have partition keys in their SQL queries and use partition pruning; otherwise, convert them to regular tables&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/10.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/10.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/11.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/11.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/12.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/12.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/13.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/13.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/14.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/14.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;src/backend/executor/execReplication.c&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Paper Deep Read: Anarchy in the Database</title><link>https://lastdba.com/en/2026/01/03/paper-deep-read-anarchy-in-the-database/</link><pubDate>Sat, 03 Jan 2026 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2026/01/03/paper-deep-read-anarchy-in-the-database/</guid><description>&lt;p&gt;Paper: Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility&lt;/p&gt;
&lt;p&gt;GitHub: &lt;a href="https://github.com/cmu-db/ext-analyzer" target="_blank" rel="noreferrer"&gt;https://github.com/cmu-db/ext-analyzer&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PGConf: The trouble with extensions (PGConf.dev 2025)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why This Paper
 &lt;div id="why-this-paper" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-this-paper" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This is a survey of database extensions (mainly Postgres), covering the implementation approaches of extensions across different databases, existing problems, and most importantly, compatibility. The most significant finding: an evaluation of over 400 PostgreSQL extensions shows that 16.8% of extensions have compatibility issues with at least one other extension, potentially leading to system failures.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Paper: Anarchy in the Database: A Survey and Evaluation of Database Management System Extensibility&lt;/p&gt;
&lt;p&gt;GitHub: &lt;a href="https://github.com/cmu-db/ext-analyzer" target="_blank" rel="noreferrer"&gt;https://github.com/cmu-db/ext-analyzer&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PGConf: The trouble with extensions (PGConf.dev 2025)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why This Paper
 &lt;div id="why-this-paper" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-this-paper" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This is a survey of database extensions (mainly Postgres), covering the implementation approaches of extensions across different databases, existing problems, and most importantly, compatibility. The most significant finding: an evaluation of over 400 PostgreSQL extensions shows that 16.8% of extensions have compatibility issues with at least one other extension, potentially leading to system failures.&lt;/p&gt;
&lt;p&gt;Analysis tools and results are on GitHub; Marco Slot&amp;rsquo;s presentation is at PGConf.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Extension Categories
 &lt;div id="extension-categories" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#extension-categories" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Extension Classification
 &lt;div id="extension-classification" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#extension-classification" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The extension classification chapter is particularly lengthy — a single diagram actually clarifies everything.&lt;/p&gt;
&lt;p&gt;Extensions across 6 databases:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c2025f80a5c9.png" alt="image-20251228140624785" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL (1986): Written in C, designed from the beginning as an extensible architecture. Consequently, PostgreSQL has the richest and most diverse extensible ecosystem.&lt;/li&gt;
&lt;li&gt;MySQL (1994): Written in C++, best known for its storage engine plugin architecture.&lt;/li&gt;
&lt;li&gt;MariaDB (2009): A fork of MySQL, also C++ based, supporting more extensions than the original MySQL.&lt;/li&gt;
&lt;li&gt;SQLite (2000): Embedded database written in C, adaptable to various hardware devices and operating systems.&lt;/li&gt;
&lt;li&gt;Redis (2009): In-memory key-value store written in C++, uniquely extensible — only supports running above the DBMS key-value storage layer.&lt;/li&gt;
&lt;li&gt;DuckDB (2018): Embedded analytical database written in C++, with a rapidly emerging extensible ecosystem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Flexibility and Security
 &lt;div id="flexibility-and-security" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#flexibility-and-security" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Extension security and flexibility are a trade-off — PG extensions are the most flexible but least secure; Redis is the most secure but least flexible:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a4b3110396a3.png" alt="image-20260103140026801" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;How PostgreSQL Extensions Are Typically Implemented
 &lt;div id="how-postgresql-extensions-are-typically-implemented" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-postgresql-extensions-are-typically-implemented" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG generally has two ways to implement extensions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Through handler functions, such as UDFs, UDTs, external tables, storage engines, and index access methods.&lt;/li&gt;
&lt;li&gt;Through hooks. Hooks are declared as function pointers in global variables; if a hook is set, it will call these pointers instead of its own code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Implementations may use both approaches — they&amp;rsquo;re not mutually exclusive. The other 5 databases have generally similar implementations, but &lt;strong&gt;none of them have hook-based implementations&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a0199291618a.png" alt="image-20251228170307440" /&gt;&lt;/p&gt;
&lt;p&gt;Extensions may use different implementation approaches, e.g., function + types + index AM — this is the number of extensibility types. From Figure 1, we can see that extensions with 1-3 types are the most common, and the most-used implementation approach is function.&lt;/p&gt;
&lt;p&gt;From Table 3, 92.5% of extensions use UDFs — after all, it&amp;rsquo;s a user-facing feature, easiest to develop with the lowest barrier to entry. The least used is client authentication, as this scenario itself is uncommon.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Extension Code Copy Rate
 &lt;div id="extension-code-copy-rate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#extension-code-copy-rate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The paper also conducted an interesting survey: the extent to which extension code is copied from built-in code:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/69d841de287f.png" alt="image-20260103104107929" /&gt;&lt;/p&gt;
&lt;p&gt;Out of 441 extensions, 16.6% — 73 extensions — contain at least one line copied from PG source code. The detailed distribution is shown in the left chart above.&lt;/p&gt;
&lt;p&gt;Why are so many extensions copying code? Because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Some functions in PG source are declared static, only callable within their own file, so they can only be copied.&lt;/li&gt;
&lt;li&gt;Due to the extension&amp;rsquo;s own requirements, functions may need slight adjustments, so they can only be copied and adjusted.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;And how much were these copied functions adjusted? See the right chart above.&lt;/p&gt;
&lt;p&gt;As can be seen, unmodified copies are actually rare.&lt;/p&gt;
&lt;p&gt;In summary, extension code is copied from PG source out of necessity, and the overall copy rate isn&amp;rsquo;t high.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Heavyweight! — PG Extension Compatibility
 &lt;div id="the-heavyweight--pg-extension-compatibility" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-heavyweight--pg-extension-compatibility" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This is the most interesting part of the paper: pairwise compatibility testing was conducted on 96 extensions, and testing found that 16.8% of extension pairs are incompatible!&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6d87c80af09b.png" alt="image-20260103111359805" /&gt;&lt;/p&gt;
&lt;p&gt;Testing methodology:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Installation. Yes, installation alone can cause problems. The authors tested both A→B and B→A installation orders, hence the asymmetric diagram.&lt;/li&gt;
&lt;li&gt;Running the extension&amp;rsquo;s provided unit tests.&lt;/li&gt;
&lt;li&gt;pgbench. Smoke testing. pgbench is of course simple, but good results here can still indicate something.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Among the top 20 least compatible extensions, many commonly-used ones appear:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Common extensions: pg_hint_plan, vector, pg_show_plans, pgsentinel, pg_cron, pg_stat_kcache&lt;/li&gt;
&lt;li&gt;Heavy extensions: citus, timescaledb&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fact that such extremely common and star extensions can have such poor compatibility is jaw-dropping.&lt;/p&gt;
&lt;p&gt;What&amp;rsquo;s even more chilling: this is just simple pairwise testing. Running 3-10 extensions should be the production norm, and production environments are far more complex and variable than the paper&amp;rsquo;s three testing methods.&lt;/p&gt;
&lt;p&gt;Finally, the paper identifies the reason for poor extension compatibility: extensions that use more components, extension types, and hooks are more likely to be incompatible with other extensions.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Nitpicking
 &lt;div id="nitpicking" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#nitpicking" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;It&amp;rsquo;s really still about Postgres&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The paper&amp;rsquo;s title says DBMS, but it&amp;rsquo;s mainly about PG compatibility. MySQL, Redis, etc. compatibility is only covered in the survey, with no experimental data at all. (Though the survey is interesting — you can learn how MySQL and Redis extensions are implemented.)&lt;/p&gt;
&lt;p&gt;On the other hand, this paper has a kind of alternative &amp;ldquo;general-specific-general&amp;rdquo; feel: &amp;ldquo;DBMS-Postgres-DBMS&amp;rdquo; &amp;#x1f605;&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;strong&gt;Insufficient compatibility testing&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;PG has 400+ extensions, but only 96 were tested for compatibility, and only 1-on-1 compatibility testing, without tests involving 3 or more extensions. The compatibility testing isn&amp;rsquo;t particularly comprehensive.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Conclusion
 &lt;div id="conclusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#conclusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PG extensions are indeed numerous and flexible — you&amp;rsquo;d struggle to find functionality that PG extensions &lt;em&gt;don&amp;rsquo;t&lt;/em&gt; support. But the extensions themselves are almost in a state of &amp;ldquo;anarchy&amp;rdquo; — both extension development and usage have problems.&lt;/p&gt;
&lt;p&gt;From the compatibility results, extension compatibility is quite poor — even the installation order affects compatibility. Multiple extensions also depend on hook execution order; for example, two extensions both requiring themselves to execute last becomes awkward. &amp;ldquo;Having everything&amp;rdquo; doesn&amp;rsquo;t mean &amp;ldquo;install everything.&amp;rdquo;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Extension Security Issues
 &lt;div id="extension-security-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#extension-security-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG extensions have virtually no security management, whether from inherently unsafe extensions or user privilege escalation through extensions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;If an extension contains unsafe languages, only the OS can restrict its behavior, not the DBMS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If an extension can access user space, the OS layer cannot manage it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Extensions implemented through queries (e.g., UDFs) generally won&amp;rsquo;t bypass ACL policies. While UDFs are more secure, they&amp;rsquo;re not absolutely secure, as UDFs with admin privileges can exist.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A single hook may not be restricted by ACL, because in PostgreSQL, ACL is only enforced at the planning and execution layers. PG provides &lt;code&gt;SECURITY LABEL&lt;/code&gt; to restrict access control for objects (including extensions).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Philosophical Thoughts on Software Management
 &lt;div id="philosophical-thoughts-on-software-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#philosophical-thoughts-on-software-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&amp;ldquo;If an extension contains unsafe languages, only the OS can restrict its behavior, not the DBMS.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;This statement itself isn&amp;rsquo;t wrong, but it carries an implication of &amp;ldquo;your directory could be deleted.&amp;rdquo; To counter this, consider the following:&lt;/p&gt;
&lt;p&gt;If you use this software, you trust it, just like PG itself (but even when using PG, you create a postgres OS user rather than using root directly). As for extensions, treat them as part of the PG software. PG is trusted and can be installed directly in production because of its industry reputation. The same goes for extensions — choose reputable extensions rather than using them indiscriminately. This is essentially the difference between PostgreSQL community gatekeeping and extension provider gatekeeping. For cloud service providers, many extensions aren&amp;rsquo;t supported — the cloud provider assumes the gatekeeping function and the responsibility of taking the blame.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Version Convergence
 &lt;div id="version-convergence" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#version-convergence" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG extension versions have these characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The same extension may have different extension packages for different database versions.&lt;/li&gt;
&lt;li&gt;Extensions have different versions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This means that without version management, you&amp;rsquo;ll end up with unmanageable numbers of software versions. To address this, limiting specific PG versions to installing specific extension versions is a good approach. As for extension upgrades needed for certain requirements, implement them through PG version upgrades. This strategy sacrifices some flexibility to ensure stability. I personally think it&amp;rsquo;s worthwhile — the need to upgrade extensions itself isn&amp;rsquo;t common, but it can reduce many software management issues and unknown compatibility problems.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Consider Compatibility When Using Extensions
 &lt;div id="consider-compatibility-when-using-extensions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#consider-compatibility-when-using-extensions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since extension compatibility isn&amp;rsquo;t great, &lt;strong&gt;managing extensions becomes especially important&lt;/strong&gt; — we don&amp;rsquo;t want the database returning strange results or even crashing while running.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Extension management strategy: 1. Install necessary extensions. 2. Create needed extensions on demand. 3. Don&amp;rsquo;t install obscure extensions.&lt;/li&gt;
&lt;li&gt;Search the compatibility matrix. While PG compatibility testing isn&amp;rsquo;t perfect, it&amp;rsquo;s still valuable. Since the paper isn&amp;rsquo;t directly searchable for the compatibility matrix, you can &amp;ldquo;ctrl+f&amp;rdquo; search the &lt;a href="https://github.com/cmu-db/ext-analyzer/blob/main/plot_scripts/csvs/compatibility_results.csv" target="_blank" rel="noreferrer"&gt;ext-analyzer compatibility table&lt;/a&gt; to preliminarily assess whether extensions you need have good compatibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Trivia
 &lt;div id="trivia" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#trivia" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;In the 1976 INGRES paper, UDFs were already implemented through extensions. Even POSTGRES carried forward this functionality in its 1986 initial release. Oracle&amp;rsquo;s UDF implementation came in Oracle 7, released in &lt;a href="https://www.orafaq.com/wiki/Oracle_7" target="_blank" rel="noreferrer"&gt;1992&lt;/a&gt; — much later than PG.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/da3915cf7a37.png" alt="image-20251228104850349" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2830dc6d8871.png" alt="image-20251228104840046" /&gt;&lt;/p&gt;
&lt;p&gt;The SQL standard didn&amp;rsquo;t include UDFs until 1996 — a full 20 years after INGRES&amp;rsquo;s UDF. Stonebraker indeed wasn&amp;rsquo;t very focused on driving standards.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2026/01/03/" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2026/01/03/&lt;/a&gt;论文精读插件无政府状态/&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>Case Study: Row Locks and LWLock LockManager</title><link>https://lastdba.com/en/2025/12/21/case-study-row-locks-and-lwlock-lockmanager/</link><pubDate>Sun, 21 Dec 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/12/21/case-study-row-locks-and-lwlock-lockmanager/</guid><description>&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database showed a large number of row locks and a smaller number of LWLock LockManager waits. CPU was maxed out and active sessions spiked. The blocking PID associated with the locks kept changing, with no obvious long-transaction blocker.
(Imagine high CPU and active sessions.)&lt;/p&gt;
&lt;p&gt;The SQL corresponding to the large number of locks was as follows:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; lzl_record &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; rc_lzl1&lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, pc_lzl2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pc_lzl2 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, rc_lzl3 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl3 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;No Increase in SQL Concurrency Observed
 &lt;div id="no-increase-in-sql-concurrency-observed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#no-increase-in-sql-concurrency-observed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;From the correlation between hits and CPU, we can analyze from the SQL hit perspective. That UPDATE SQL accounted for about 80% of activity. The SQL&amp;rsquo;s execution count had not changed, but &lt;code&gt;blks hit&lt;/code&gt; was clearly abnormal.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database showed a large number of row locks and a smaller number of LWLock LockManager waits. CPU was maxed out and active sessions spiked. The blocking PID associated with the locks kept changing, with no obvious long-transaction blocker.
(Imagine high CPU and active sessions.)&lt;/p&gt;
&lt;p&gt;The SQL corresponding to the large number of locks was as follows:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; lzl_record &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; rc_lzl1&lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, pc_lzl2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pc_lzl2 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, rc_lzl3 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; rc_lzl3 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;No Increase in SQL Concurrency Observed
 &lt;div id="no-increase-in-sql-concurrency-observed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#no-increase-in-sql-concurrency-observed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;From the correlation between hits and CPU, we can analyze from the SQL hit perspective. That UPDATE SQL accounted for about 80% of activity. The SQL&amp;rsquo;s execution count had not changed, but &lt;code&gt;blks hit&lt;/code&gt; was clearly abnormal.&lt;/p&gt;
&lt;p&gt;We also analyzed metadata access — within snapshots, no metadata tables showed unusually high access.&lt;/p&gt;
&lt;p&gt;From the symptom analysis, neither SQL concurrency increase nor metadata anomalies were apparent. The reason for the SQL hit increase wasn&amp;rsquo;t obvious at this point.&lt;/p&gt;

&lt;h3 class="relative group"&gt;LWLock LockManager Analysis
 &lt;div id="lwlock-lockmanager-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lwlock-lockmanager-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since the SQL itself is simple — the &lt;code&gt;lzl_id&lt;/code&gt; field in the &lt;code&gt;lzl_record&lt;/code&gt; table is a unique field, meaning the update is done by unique key.&lt;/p&gt;
&lt;p&gt;In addition to the large number of explicit locks, the wait events at the scene also included LWLock LockManager.&lt;/p&gt;
&lt;p&gt;However, the table is a regular table (not partitioned), with only 4 or 5 indexes on it.&lt;/p&gt;
&lt;p&gt;LWLock LockManager is related to not using the fast path. Simple queries and DML can use the fast path:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Weak relation locks. SELECT, INSERT, UPDATE, and DELETE must acquire a
lock on every relation they operate on, as well as various system catalogs
that can be used internally. Many DML operations can proceed in parallel
against the same table at the same time; only DDL operations such as
CLUSTER, ALTER TABLE, or DROP &amp;ndash; or explicit user action such as LOCK TABLE
&amp;ndash; will create lock conflicts with the &amp;ldquo;weak&amp;rdquo; locks (AccessShareLock,
RowShareLock, RowExclusiveLock) acquired by DML operations.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;So a SELECT/DML accessing no more than 16 relations (including indexes) should be able to use the fast path, and there shouldn&amp;rsquo;t be much LWLock LockManager.&lt;/p&gt;
&lt;p&gt;However, DML certainly can&amp;rsquo;t simply use the fast path — fast path handles lock operations entirely locally, but DML must check whether other sessions hold locks on the row and needs to access shared memory. Combined with the fact that this SQL updates by unique field yet still encounters row locks, it must be updating the same row.&lt;/p&gt;
&lt;p&gt;From the logs, we could see instances of updating the same row — one row had tens of thousands of lock-waiting updates.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Benchmark Testing
 &lt;div id="benchmark-testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#benchmark-testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Benchmarking Same-Row Updates to Reproduce LWLock LockManager
 &lt;div id="benchmarking-same-row-updates-to-reproduce-lwlock-lockmanager" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#benchmarking-same-row-updates-to-reproduce-lwlock-lockmanager" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Given that row locks definitely can&amp;rsquo;t rely solely on the fast path, and knowing that LWLock LockManager degrades database performance, we benchmarked different scenarios.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;#prompt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Give me a pgbench benchmark script
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Table structure: primary key, unique field + unique index, other fields
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Update: update by unique field
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Benchmark repeated updates on the same row (repeated row-lock updates)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Benchmark random updates on different rows (no row-lock updates)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Script omitted. Environment: 20 cores, 96GB RAM.&lt;/p&gt;
&lt;p&gt;pgbench commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgbench -h localhost -p $PGPORT -d lzldb -U dbmgr -f update_same_unique_key.sql -c &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; -j &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; -T &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; -r -S
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgbench -h localhost -p $PGPORT -d lzldb -U dbmgr -f update_random_unique_key.sql -c &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; -j &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; -T &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; -r -S&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Wait events during the benchmark:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Update same row, 2 typical samples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LockManager &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALSync &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;180&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LockManager &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALSync &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Update different rows, 2 typical samples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------------------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; BufferMapping &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cnt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------------------+---------------------+-----------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idle &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; XactGroupUpdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IPC &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALSync &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; XactSLRU &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; BufferContent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the wait events, the difference is clear: updating the same row produces LWLock LockManager, sometimes at a high proportion. Updating different rows mostly just waits on CPU. Scenario 1 matches the production situation.&lt;/p&gt;

&lt;h2 class="relative group"&gt;A Brief Analysis of Row Locks and Fast Path
 &lt;div id="a-brief-analysis-of-row-locks-and-fast-path" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-brief-analysis-of-row-locks-and-fast-path" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The lmgr README&amp;rsquo;s explanation of the fast path:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Fast Path Locking
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Fast path locking is a special purpose mechanism designed to reduce the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;overhead of taking and releasing certain types of locks which are taken
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;and released very frequently but rarely conflict. Currently, this includes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;two categories of locks:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(1) Weak relation locks. SELECT, INSERT, UPDATE, and DELETE must acquire a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lock on every relation they operate on, as well as various system catalogs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;that can be used internally. Many DML operations can proceed in parallel
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;against the same table at the same time; only DDL operations such as
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CLUSTER, ALTER TABLE, or DROP -- or explicit user action such as LOCK TABLE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- will create lock conflicts with the &amp;#34;weak&amp;#34; locks (AccessShareLock,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RowShareLock, RowExclusiveLock) acquired by DML operations.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Conditions for locks that can use the fast path, from &lt;code&gt;lmgr/lock.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The fast-path lock mechanism is concerned only with relation locks on
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * unshared relations by backends bound to a database. The fast-path
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * mechanism exists mostly to accelerate acquisition and release of locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * that rarely conflict. Because ShareUpdateExclusiveLock is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * self-conflicting, it can&amp;#39;t use the fast-path mechanism; but it also does
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * not conflict with any of the locks that do, so we can ignore it completely.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define EligibleForRelationFastPath(locktag, mode) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	((locktag)-&amp;gt;locktag_lockmethodid == DEFAULT_LOCKMETHOD &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(locktag)-&amp;gt;locktag_type == LOCKTAG_RELATION &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(locktag)-&amp;gt;locktag_field1 == MyDatabaseId &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	MyDatabaseId != InvalidOid &amp;amp;&amp;amp; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(mode) &amp;lt; ShareUpdateExclusiveLock)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;SELECT/DML can use the fast path, but only for &lt;code&gt;locktype=relation&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at the actual lock situation when there&amp;rsquo;s a row lock:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_backend_pid()) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; pid,locktype;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; page &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; classid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; fastpath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+--------+------------------+---------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706189&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706190&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706187&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706187&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG&amp;rsquo;s row lock implementation is quite complex — it involves not only tuple locks, but also transactionid and relation locks. Among these, only &lt;code&gt;locktype=relation&lt;/code&gt; and &lt;code&gt;virtualxid&lt;/code&gt; can use the fast path; all others cannot.&lt;/p&gt;
&lt;p&gt;Compare with the no-row-lock case:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_backend_pid()) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; pid,locktype;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; page &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tuple &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; classid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; objsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; fastpath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------+----------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+--------+------------------+---------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706214&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4792&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;220559&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4267681&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5290151&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;170706212&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;563&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;253641&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are only 2-3 fewer &lt;code&gt;fastpath=f&lt;/code&gt; entries. The transactionid locks held by both sessions definitely can&amp;rsquo;t use the fast path.&lt;/p&gt;
&lt;p&gt;Summary of conditions for using the fast-path lock mechanism (all must be met):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lock level &amp;lt;= 3, i.e., SELECT/DML statements&lt;/li&gt;
&lt;li&gt;&lt;code&gt;locktype=relation&lt;/code&gt;. PG&amp;rsquo;s row locks also require at least transactionid and tuple locks, so these two can&amp;rsquo;t use the fast path&lt;/li&gt;
&lt;li&gt;Fewer than 16 relations accessed (typically exceeded only with full partition access on partitioned tables)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Conclusion
 &lt;div id="conclusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#conclusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Is the row lock the cause or the effect? Is it a row lock problem, or did database performance degrade causing SQL to run slower and produce row locks?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Row lock is the cause. The SQL execution count didn&amp;rsquo;t change, but the SQL parameters shifted from scattered to concentrated — i.e., updates to the same row noticeably increased. From the benchmark data, updating the same row produces row lock and LWLock LockManager waits.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;SQL execution count didn&amp;rsquo;t increase — did SQL performance degrade?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;SQL performance did degrade, but the index was definitely not chosen incorrectly — it was simply because the same row was being updated repeatedly.&lt;/p&gt;
&lt;p&gt;Solution:&lt;/p&gt;
&lt;p&gt;From the business side, the SQL was tied to a certain API endpoint: after being called, it updates the call count into the table. If the same endpoint is called repeatedly, it&amp;rsquo;s possible to repeatedly update the same row. Therefore, reducing repeated calls to the same endpoint, or batching the database updates into fewer, larger batches, is expected to mitigate this problem.&lt;/p&gt;</content:encoded></item><item><title>My 2025 Year-End Summary</title><link>https://lastdba.com/en/2025/12/21/my-2025-year-end-summary/</link><pubDate>Sun, 21 Dec 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/12/21/my-2025-year-end-summary/</guid><description>&lt;h2 class="relative group"&gt;As a DBA
 &lt;div id="as-a-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#as-a-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;As a DBA, I strongly believe in first principles and information theory when it comes to problem analysis. A DBA needs to deeply understand the system, understand PostgreSQL, to explain anomalies from first principles. For example, in the first half of the year I spent considerable effort understanding Linux memory, exploring the essence of memory issues and their solutions. At the same time, this year I took a step forward in system operations — no longer focusing solely on technical problems and handling, but more on providing solutions. These should encompass thinking across the PostgreSQL database technology dimension, the system dimension, and the management dimension.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;As a DBA
 &lt;div id="as-a-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#as-a-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;As a DBA, I strongly believe in first principles and information theory when it comes to problem analysis. A DBA needs to deeply understand the system, understand PostgreSQL, to explain anomalies from first principles. For example, in the first half of the year I spent considerable effort understanding Linux memory, exploring the essence of memory issues and their solutions. At the same time, this year I took a step forward in system operations — no longer focusing solely on technical problems and handling, but more on providing solutions. These should encompass thinking across the PostgreSQL database technology dimension, the system dimension, and the management dimension.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a simple classification of cloud DBA work:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2e7e4042bd88.png" alt="image-20251221120908749" /&gt;&lt;/p&gt;
&lt;p&gt;Many Ops papers only talk about incident handling, but in reality, incident handling probably accounts for less than 5% of actual operational workload. And whether in academia or practice, anomaly ops itself isn&amp;rsquo;t very effective anyway. So I&amp;rsquo;m not very bullish on AIOps being able to significantly help DBAs. Note that DBAs using AIOps and DBAs using AI are two different things.&lt;/p&gt;
&lt;p&gt;Actually, this diagram is just so-so, because it doesn&amp;rsquo;t include leadership tasks, which are definitely the bulk.&lt;/p&gt;
&lt;p&gt;Looking back at the 2023 and 2024 year-end summaries, I can simply summarize my DBA work year by year:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;2023: Comprehensive PostgreSQL learning&lt;/li&gt;
&lt;li&gt;2024: Comprehensive PostgreSQL operations&lt;/li&gt;
&lt;li&gt;2025: Responsible for 1510 emotional value&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What&amp;rsquo;s deeply ironic is that last year&amp;rsquo;s conclusion — &amp;ldquo;DBAs are providing 1510 emotional value to their leaders&amp;rdquo; — became my lived reality this year. I don&amp;rsquo;t want to say more about it. In short, it&amp;rsquo;s been exhausting, mentally draining. I hope next year brings improvement.&lt;/p&gt;

&lt;h2 class="relative group"&gt;READING
 &lt;div id="reading" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reading" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/710a408c1e3a.png" alt="image-20251220160932315" /&gt;&lt;/p&gt;
&lt;p&gt;This year I read even more books than last year (from 20+ to 30+), but wrote even fewer reading notes. Writing is indeed troublesome and energy-consuming, and I&amp;rsquo;ve grown to prefer the feeling of reading itself. Compared to last year, this year&amp;rsquo;s reading shows a clear decrease in PostgreSQL technical books, an increase in comprehensive technical books, and I even started reading psychology, economics, and philosophy. In short, broader hunting grounds, not limited to databases alone. Also fewer novels — novels are like snacks, and I&amp;rsquo;m increasingly losing interest in such non-nutritious content.&lt;/p&gt;
&lt;p&gt;This year&amp;rsquo;s book list generally falls into: IT Systems, Economics, Popular Science, Spiritual, and Fiction categories. As with last year, ranked &lt;strong&gt;by personal preference&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;IT Systems Book List:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;SRE: Google&amp;rsquo;s Approach to Service Reliability&amp;rdquo; — DBAs are not SREs, but their work involves system stability objectives, which has similarities with DBA work. Some content in this book about cloud environments or management aspects was truly enlightening — for example, SLA, systems engineering, operational pressure, busy work, role rotation, &amp;ldquo;trust the team rather than a single technical expert,&amp;rdquo; and more. Absolutely brilliant. Recently I also heard the term DBRE — Database Reliability Engineer — which fits my current role even better than DBA. In short, an excellent book, a must-read for modern ops.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Running Linux Kernel: Introduction&amp;rdquo; — operating open-source databases requires understanding the operating system. One of my books for studying Linux memory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Deep Understanding of Linux Processes and Memory&amp;rdquo; — one of my books for studying Linux memory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Understanding the Linux Kernel&amp;rdquo; — one of my books for studying Linux memory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Observability Engineering&amp;rdquo; — the patterns and flaws of traditional monitoring and traditional ops, and what observability essentially means. Quite helpful.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Economics Book List:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Microeconomics&amp;rdquo; — a masterpiece, by Daron Acemoglu. I consider it essential reading for life. This book has my best notes of any book. Not only understanding economics, but further understanding society. Some viewpoints left a deep impression on me:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Proves why the market is an invisible hand that maximizes social surplus value — any intervention reduces social surplus value.&lt;/li&gt;
&lt;li&gt;Under what circumstances markets are ineffective: externalities, public resources, and common-pool resources.&lt;/li&gt;
&lt;li&gt;Women earn less than men in the workplace partly because women bear children and cannot participate in production during that time.&lt;/li&gt;
&lt;li&gt;The function of academic credentials is signaling — to a certain degree, they certify the productive value of the person.&lt;/li&gt;
&lt;li&gt;Business entry and exit are normal market signals, not signs of disorder.&lt;/li&gt;
&lt;li&gt;The trade-off between equity and efficiency is a subject of study.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Why Nations Fail&amp;rdquo; — a masterpiece, by Daron Acemoglu. This book can be summarized in one sentence: Why do nations succeed? Because of creative destruction. Daron Acemoglu won the 2024 Nobel Prize in Economics for &amp;ldquo;research on how institutions are formed and how they affect prosperity.&amp;rdquo; What&amp;rsquo;s even more remarkable is that this book is easier to understand than other economics works. The top-recommended economics masterpiece.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Rational Optimist&amp;rdquo; — said to rival &amp;ldquo;Sapiens,&amp;rdquo; but it&amp;rsquo;s definitely a notch below. However, the content quality isn&amp;rsquo;t bad, and it&amp;rsquo;s more economics-oriented. Some viewpoints are very fresh, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Modern economics makes the rich richer, but the poor are not getting poorer.&lt;/li&gt;
&lt;li&gt;Self-sufficiency is poverty.&lt;/li&gt;
&lt;li&gt;What distinguishes humans from animals is barter exchange (in &amp;ldquo;Sapiens&amp;rdquo; it&amp;rsquo;s the cognitive revolution).&lt;/li&gt;
&lt;li&gt;Higher income leads to greater happiness — this is a fact.&lt;/li&gt;
&lt;li&gt;The elevation of trade in social status came from the rise of maritime trade, because land trade was unstable and easily plundered.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Reminiscences of a Stock Operator&amp;rdquo; — feels like I learned something and nothing at the same time. Decent read though.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;Game Theory&amp;rdquo; — honestly, I found it average. Not much content, quite superficial. I mainly read it because economics books keep mentioning game theory, so I flipped through it to evaluate.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Wealth of Nations&amp;rdquo; — extremely dense, not for normal people to read. Incredibly content-rich. Adam Smith must have been a genius — hard to imagine what kind of mind produced this. Too difficult for me, didn&amp;rsquo;t finish, gave up.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Popular Science Book List:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;A Brief History of Intelligence&amp;rdquo; — a masterpiece, essential reading for the AI era. This book is worn from my constant reading, covered in notes everywhere. Deconstructing the human brain, understanding what intelligence is, understanding how AI came to be. I give it full marks! Now whenever I see any animal, I first think about what intelligence level it&amp;rsquo;s at&amp;hellip;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;On Top of Tides&amp;rdquo; — by Wu Jun. Every IT professional should read this book. It tells the rise and fall of major IT companies. You can learn about Oracle, Google, Fairchild, Bell Labs, and even basics about venture capital. Every company has its own DNA, which is nearly unchangeable and determines the company&amp;rsquo;s culture and characteristics. A programmer&amp;rsquo;s must-read.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Almanack of Naval Ravikant&amp;rdquo; — has many useful perspectives, like views on marginal utility. And more importantly, it recommended one of my favorite books this year — &amp;ldquo;Microeconomics.&amp;rdquo; It also recommended meditation, which changed my habits.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;How to Manage a Software Company&amp;rdquo; — by Frank Slootman, a legendary Silicon Valley CEO who led three software companies (ServiceNow, Data Domain, Snowflake) to successful IPOs. A very good book, looking at company development, employee management, execution, decision-making, and decision failures from an IT company manager&amp;rsquo;s perspective. Highly recommended.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Economics of Aging&amp;rdquo; — by Kenichi Ohmae. Using Japan&amp;rsquo;s aging problem to glimpse China&amp;rsquo;s aging problems and opportunities. The demographic structural risks in our country are severe and about to come to a head. In this era, highly recommended reading.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Fourth Wave&amp;rdquo; — by Kenichi Ohmae. Mainly about how Japan missed the IT technology wave, still relying on old industries to support the national economy, appearing somewhat envious of South Korea and China. I personally love the author&amp;rsquo;s attitude of directly criticizing the prime minister, haha.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Checklist Manifesto&amp;rdquo; — explains the necessity of checklist inspections before Western surgical procedures. Seemingly simple steps can dramatically increase surgical success rates. This book had a big impact on my work — I genuinely brought the checklist concept into my work. I treat database operations like a surgical procedure — checklists are a simple yet necessary means to improve success rates.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Mythical Man-Month&amp;rdquo; — &amp;ldquo;adding people&amp;rdquo; cannot linearly reduce systems engineering project timelines, but you also can&amp;rsquo;t simply reject &amp;ldquo;adding people&amp;rdquo; because large systems engineering projects genuinely require many people collaborating. It&amp;rsquo;s a good book, but calling it a programmer&amp;rsquo;s must-read feels like a stretch.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Beauty of Mathematics&amp;rdquo; — by Wu Jun. Also quite good. Technology always has its mathematical foundations. This book accessibly tells the beauty of mathematics.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;McKinsey Structured Thinking&amp;rdquo; — any problem should be structurally decomposed. When I encounter new problems, I think this way. A useful book.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Chrysanthemum and the Sword&amp;rdquo; — stock from years ago that I dug out to read. An American&amp;rsquo;s post-WWII perspective on Japan. You can glimpse aspects of Japanese culture like modified Confucianism without &amp;ldquo;benevolence (ren),&amp;rdquo; the psychology of indebtedness, etc. One drawback is it&amp;rsquo;s quite dated — modern Japan is largely different from that era.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Black Swan&amp;rdquo; — a black swan refers to unforeseen extreme events. Black swan events will always happen — there&amp;rsquo;s no such thing as 100% accurate prediction. It also discusses classification, which reminded me of content from &amp;ldquo;Structured Thinking&amp;rdquo; and &amp;ldquo;The Worlds I See&amp;rdquo;: &amp;ldquo;The essence of human understanding is classifying things,&amp;rdquo; but classification always awkwardly leaves some things unclassifiable or unable to be classified. Black swan events exist from the moment of classification. An interesting and noteworthy reflection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&amp;ldquo;The Professional&amp;rdquo; — by Kenichi Ohmae. Very mediocre, not recommended.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Spiritual / Self-Help Book List:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&amp;ldquo;The Evolution of Desire&amp;rdquo; — evolutionary psychology, a masterpiece.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Die with Zero&amp;rdquo; — experience the right things at different life stages. Even if you revisit something after missing it, it won&amp;rsquo;t feel the same as experiencing it at the right time. A life manual, highly recommended.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Ten Minutes Meditation&amp;rdquo; — mainly about the importance of meditation and how to do it. I learned meditation through this book. When I first completed meditation, I fell in love with it. It gave me a feeling of being taken to outer space and then returning to Earth. More importantly, it truly relieves stress. Meditation has become part of my life.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Manipulation Bible&amp;rdquo; — okay.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Siddhartha&amp;rdquo; — incomprehensible, rubbish.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Book of Life&amp;rdquo; — pure chicken soup, rubbish.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Fiction Book List:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&amp;ldquo;The Stranger&amp;rdquo; — a masterpiece. An indescribable sense of authenticity, feeling like an outsider oneself.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Yellowface&amp;rdquo; — a very interesting book about a white American woman who plagiarizes an unpublished work by a deceased Asian writer, even using a very Chinese pen name. When fans discover she&amp;rsquo;s white, you can feel the embarrassment. Playfully explores racial prejudice. As thrilling as watching a TV drama — twists and turns, gripping. Highly recommended.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The World of Yesterday&amp;rdquo; — by Stefan Zweig. Austria, Europe, WWI and WWII through a writer&amp;rsquo;s eyes. Returning to that turbulent Europe from a different angle. A very good book.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Project Hail Mary&amp;rdquo; — sci-fi. I increasingly dislike reading sci-fi. This one is okay: imagine you&amp;rsquo;re on an alien exploration mission, all your crewmates have died, and you happen to encounter a friendly alien. How do you communicate with them&amp;hellip;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Letter from an Unknown Woman&amp;rdquo; — by Stefan Zweig. Not good. Only the first story is somewhat novel. No interest in seriously reading the other two.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Satantango&amp;rdquo; — incomprehensible. Even Nobel Prize in Literature winners vary in quality.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Blog and WeChat Official Account
 &lt;div id="blog-and-wechat-official-account" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#blog-and-wechat-official-account" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The name of my WeChat Official Account has always been a struggle. I didn&amp;rsquo;t put much thought into maintaining it anyway, so I casually used a few names. This year I watched a documentary — &amp;ldquo;The Last Porter&amp;rdquo; (最后的棒棒), which moved me deeply. The DBA profession, like the porters of Chongqing, is undergoing tremendous change. So I simply changed it to &amp;ldquo;最后的DBA&amp;rdquo; (The Last DBA). This name rolls off the tongue nicely and carries some historical context and philosophical reflection. Seems like a good name.&lt;/p&gt;
&lt;p&gt;Since a lot of time goes into work, I didn&amp;rsquo;t have much time for writing to begin with. Plus, this year my operational approach kept changing, and no matter how I adjusted my daily schedule, I couldn&amp;rsquo;t carve out a good time slot. I even invested some money, and my time still didn&amp;rsquo;t increase, which frustrated me for quite a while. Looking back now, I only published 12 articles this year — not even one in the first half of the year.&lt;/p&gt;
&lt;p&gt;Very dissatisfied. &amp;#x1f620;&lt;/p&gt;
&lt;p&gt;I don&amp;rsquo;t know if my skills have improved or if the system is genuinely stable, but cases worth deep research seem to have become fewer. But this isn&amp;rsquo;t really a big problem. This year I also started treating paper interpretation as an article type. I personally feel the results are decent — I can learn quite a bit, without being too insular or reinventing the wheel. Using AI to interpret papers would certainly be fast, but I personally feel there are two problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Do I truly understand? I feel like I don&amp;rsquo;t — it&amp;rsquo;s not the same concept as reading through it myself. Reading it yourself not only allows deeper understanding but also lets you discover all sorts of quirky details.&lt;/li&gt;
&lt;li&gt;Can&amp;rsquo;t pad articles. If I can interpret a paper with one prompt, then I feel the dissemination value is minimal — surely there&amp;rsquo;s no one not using AI now, right?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, I don&amp;rsquo;t read every paper word by word — that would be too inefficient. I only select papers that I feel are good and worth frame-by-frame interpretation, and savor them carefully.&lt;/p&gt;
&lt;p&gt;A quick summary of this year&amp;rsquo;s articles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Too few in quantity&lt;/li&gt;
&lt;li&gt;Slightly improved quality, and useful content (several articles I&amp;rsquo;m personally very satisfied with)&lt;/li&gt;
&lt;li&gt;Explored new formats&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Final Thoughts
 &lt;div id="final-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#final-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This year was a busy one, with both bad and good memories. Many important things were left unfinished. Next year should bring significant changes. Writing this year-end summary is quite interesting — looking back to see what my past selves were up to is a fun experience.&lt;/p&gt;
&lt;p&gt;Last year&amp;rsquo;s 2025 OKRs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Continue some things — FAILED&lt;/li&gt;
&lt;li&gt;Think about how to produce output — FAILED&lt;/li&gt;
&lt;li&gt;Master another track — HALF SUCCESSFUL&lt;/li&gt;
&lt;li&gt;PostgreSQL&amp;hellip; haven&amp;rsquo;t figured out what more to do — FAILED&lt;/li&gt;
&lt;li&gt;Find a way to resume fitness — FAILED&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;2026 Plan:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Continue some things&lt;/li&gt;
&lt;li&gt;Pay attention to my psychological and physical health — next year&amp;rsquo;s annual health inspection alerts should be lower than this year&amp;rsquo;s&lt;/li&gt;
&lt;li&gt;Pay attention to article readership, maintain the WeChat Official Account&lt;/li&gt;
&lt;li&gt;Explore DB AI Ops, report to myself next year&lt;/li&gt;
&lt;li&gt;Manage upward — don&amp;rsquo;t invest too much time in work&lt;/li&gt;
&lt;li&gt;Travel during holidays instead of grinding&lt;/li&gt;
&lt;li&gt;Read no fewer than 30 books, but don&amp;rsquo;t focus solely on quantity&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Paper Deep Read: DBAIOps</title><link>https://lastdba.com/en/2025/12/21/paper-deep-read-dbaiops/</link><pubDate>Sun, 21 Dec 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/12/21/paper-deep-read-dbaiops/</guid><description>&lt;p&gt;Paper: &lt;a href="https://www.arxiv.org/pdf/2508.01136" target="_blank" rel="noreferrer"&gt;DBAIOps: A Reasoning LLM-Enhanced Database Operation and Maintenance System using Knowledge Graphs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Repo: &lt;a href="https://github.com/weAIDB/DBAIOps/" target="_blank" rel="noreferrer"&gt;https://github.com/weAIDB/DBAIOps/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;What is DBAIOps
 &lt;div id="what-is-dbaiops" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-dbaiops" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Why DBAIOps:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Manual operations are extremely time-consuming.&lt;/li&gt;
&lt;li&gt;Manual operations are difficult to scale.&lt;/li&gt;
&lt;li&gt;Manual operations are often trapped in recurring failures.&lt;/li&gt;
&lt;li&gt;Documentation + RAG models are inaccurate (limited DBA experience integration).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, both manual operations and existing solutions are mediocre, hence DBAIOps — &lt;strong&gt;an operations system combining LLM reasoning and knowledge graphs to achieve DBA-like diagnostic capabilities&lt;/strong&gt;.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Paper: &lt;a href="https://www.arxiv.org/pdf/2508.01136" target="_blank" rel="noreferrer"&gt;DBAIOps: A Reasoning LLM-Enhanced Database Operation and Maintenance System using Knowledge Graphs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Repo: &lt;a href="https://github.com/weAIDB/DBAIOps/" target="_blank" rel="noreferrer"&gt;https://github.com/weAIDB/DBAIOps/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;What is DBAIOps
 &lt;div id="what-is-dbaiops" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-dbaiops" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Why DBAIOps:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Manual operations are extremely time-consuming.&lt;/li&gt;
&lt;li&gt;Manual operations are difficult to scale.&lt;/li&gt;
&lt;li&gt;Manual operations are often trapped in recurring failures.&lt;/li&gt;
&lt;li&gt;Documentation + RAG models are inaccurate (limited DBA experience integration).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In short, both manual operations and existing solutions are mediocre, hence DBAIOps — &lt;strong&gt;an operations system combining LLM reasoning and knowledge graphs to achieve DBA-like diagnostic capabilities&lt;/strong&gt;.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Comparison of database failure analysis approaches:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Rule-based approach: Traditional, rigid.&lt;/li&gt;
&lt;li&gt;Machine learning approach: Essentially rule-based with similar limitations; depends on training data leading to lower generation capability; generally suitable for diagnosing common specific problems.&lt;/li&gt;
&lt;li&gt;LLM-based approach: Uses general documentation and LLMs (e.g., decision-tree-based), prone to giving generic results.&lt;/li&gt;
&lt;li&gt;LLM+RAG approach: Searches based on chunked top-k approximate knowledge; results are inaccurate.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;After comparing the above approaches, the advantages of &lt;strong&gt;DBAIOps combining graph knowledge, DBA experience, and LLMs are clear:&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Incorporates DBA experience.&lt;/li&gt;
&lt;li&gt;Preserves original relationships.&lt;/li&gt;
&lt;li&gt;Supports new root cause identification and solutions.&lt;/li&gt;
&lt;li&gt;Extensible.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Overview
 &lt;div id="overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0901d29f4881.png" alt="image-20251214092938211" /&gt;&lt;/p&gt;
&lt;p&gt;Left side is architecture, right side is an example.&lt;/p&gt;
&lt;p&gt;Offline: DBA experience is embedded into Neo4j, with the resulting graph model called ExperienceGraph, where edges represent anomaly phenomena or metric relationships. The embedded anomaly model is called AnomalyModel.&lt;/p&gt;
&lt;p&gt;Online: Anomaly analysis, retrieval, and report generation. The AnomalyProcessor extracts standard failure information and AnomalyModel information, then retrieves the graph via ExperienceRetriever; finally, RootCauseAnalyzer calls the LLM to generate analysis reports.&lt;/p&gt;
&lt;p&gt;From the right-side example, we can see graph relevance finding LOG FILE SYNC associated with LOG WRITE performance and IO performance; through REDO ALLOCATION, we can find table structure changes and DDL.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Operations Experience Graph Model
 &lt;div id="the-operations-experience-graph-model" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-operations-experience-graph-model" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Unlike rule-based or document-chunk-based RAG, ExperienceGraph is a graph model encoding heterogeneous operations experience information. The graph contains three elements: (vertices, directed edges, relationships on edges).&lt;/p&gt;
&lt;p&gt;Based on the characteristics of operations experience, DBAIOps classifies vertices:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;trigger vertex: Used to detect database anomalies; the entry point for anomaly analysis. For example, LOG FILE SYNC is an entry vertex.&lt;/li&gt;
&lt;li&gt;metric vertex: Database runtime metrics. For offline knowledge, this refers to metrics from operations case studies (if present).&lt;/li&gt;
&lt;li&gt;experience vertex: Encodes domain-specific operations experience, covering anomaly meanings and handling methods. For example, LOG FILE SYNC exceeding 60ms indicates overly frequent commits or parameter adjustments needed.&lt;/li&gt;
&lt;li&gt;tool vertex: Executable scripts for collecting and analyzing anomaly metrics.&lt;/li&gt;
&lt;li&gt;tag vertex: Semantic categories of graph vertices. For example, &amp;ldquo;Concurrent Transactions&amp;rdquo; involves multiple vertex types; tag vertices strengthen cross-case associations.&lt;/li&gt;
&lt;li&gt;auxiliary vertex: Explains the meaning of metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Edge classification:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;containment edge: Trigger Vertex - Experience Vertex&lt;/li&gt;
&lt;li&gt;relevance edge: Trigger Vertex - Metric Vertex&lt;/li&gt;
&lt;li&gt;diagnosis edge: Experience Vertex - Metric Vertex&lt;/li&gt;
&lt;li&gt;synonym edge: Only appears between Tag Vertices, indicating semantic synonymy, e.g., physical_read and disk_read; shared_pool and shared_buffer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Analyzing the operations experience graph model through an example:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/af2763d6d88b.png" alt="image-20251215210049114" /&gt;&lt;/p&gt;
&lt;p&gt;LOG FILE SYNC has multiple TAGs, and TAGs are associated with Experience, metrics, and tools. The strong relevance is evident — it represents a human DBA&amp;rsquo;s understanding and operations experience of LOG FILE SYNC.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Graph Construction
 &lt;div id="graph-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#graph-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Manual graph construction is unreliable, and existing ML-generated graphs may generate irrelevant relationships, so a semi-automatic graph generation approach is proposed.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Graph initialization: This part is manually generated, defining trigger vertices according to rules. Once trigger vertices are generated, their associated metric vertices, experience vertices, etc., are automatically generated. This is somewhat like a human DBA guiding the creation of a knowledge sketch — the overall framework cannot be changed; nothing bizarre should be generated.&lt;/li&gt;
&lt;li&gt;Graph storage: Stored in Neo4J. Additionally, different database types are marked with tags, making much knowledge reusable and avoiding duplicate graph construction.&lt;/li&gt;
&lt;li&gt;Graph augmentation: Generating more edges.&lt;/li&gt;
&lt;li&gt;Graph updates: DBAIOps supports incremental updates. Updates here include both adding new vertices and removing old vertices.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Anomaly Model
 &lt;div id="anomaly-model" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#anomaly-model" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Metrics
 &lt;div id="metrics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#metrics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Metrics come from many sources, including runtime information (CPU %, throughput, etc., routine monitoring), logs, traces, etc. Combined with relevance differences, strongly correlated metrics need to be extracted. So metrics are divided into 2 categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Immediately collected metrics: Runtime information, logs, traces.&lt;/li&gt;
&lt;li&gt;Subsequently collected metrics: Periodic, delta, etc., metrics generated when needed, such as AWR/ASH data.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regarding metric-anomaly correlation, unlike baseline-based approaches, DBAIOps uses specific metric combinations for each anomaly type.&lt;/p&gt;
&lt;p&gt;Finally, a formula determines whether an anomaly has actually occurred:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3693e91d7723.png" alt="image-20251214093339574" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Two-Stage Graph Evolution
 &lt;div id="two-stage-graph-evolution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#two-stage-graph-evolution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Database anomalies rarely occur in isolation — one performance issue may simultaneously trigger or exacerbate others. However, connections between different anomaly models (e.g., LOG_FILE_SYNC and REDO_ALLOCATION) in pre-built knowledge graphs tend to be loose, with shared experience fragments sparse and fragmented. This makes it difficult for traditional methods to discover cross-model composite root causes, such as combined I/O bottleneck and memory pressure issues.&lt;/p&gt;
&lt;p&gt;To address this challenge, DBAIOps proposes an automatic &amp;ldquo;graph evolution&amp;rdquo; mechanism that dynamically discovers and connects relevant experience fragments between different anomaly models, evolving the knowledge graph from an initially sparse structure into a densely interconnected network, thus supporting more comprehensive root cause analysis.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Stage 1 - Graph Inference and Proximity Discovery: Uses graph query language (Cypher) to collect and aggregate relevant metrics, traversing related nodes and edges based on configurable thresholds to build association networks. For example, starting from LOG_FILE_SYNC latency, traverse up to 3 hops of associated nodes. Establish connections between LOG_FILE_SYNC and REDO_ALLOCATION models because they are both related to I/O-related concurrency issues. Through multiple iterations, the knowledge graph gradually evolves into a denser structure, enabling diagnosis to consider more potential factors and composite causes.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stage 2 - Adaptive Abnormal Metric Detection: Identifies truly anomalous metrics along graph expansion paths. Using an Adaptive Detection Function (ADF), it calculates composite anomaly scores considering dimensions such as metric volatility and dynamic baseline deviation. Based on anomaly scoring results, it decides whether further knowledge graph structure expansion is needed, filtering a precise subset of anomaly metrics for subsequent LLM root cause reasoning.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/313bde49387f.png" alt="image-20251214103841593" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Generating Analysis Reports
 &lt;div id="generating-analysis-reports" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#generating-analysis-reports" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Once the graph is ready, prompts need to be fed to the LLM to generate desired reports. A well-structured prompt can also improve report accuracy.&lt;/p&gt;
&lt;p&gt;Anomalies have 5 components, which serve as the prompt for the LLM:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Anomaly: Anomaly description (&amp;ldquo;CPU usage spiked to 95% at 16:00 on 2023-10-05&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;Condition: Anomaly trigger condition (&amp;ldquo;exceeds 90% for &amp;gt;5 min&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;Metrics&lt;/li&gt;
&lt;li&gt;Experience: Provides normal load values or recent maintenance tasks.&lt;/li&gt;
&lt;li&gt;Output: Describes the report&amp;rsquo;s composition — anomaly verification (requiring further analysis), root cause analysis, recovery plan, summary, SQL text.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Some personal thoughts&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;Recent maintenance tasks are very useful — maintenance tasks generally have strong correlation, and failure analysis can&amp;rsquo;t just be simple technical analysis. However, who updates these maintenance tasks and which ones to update or not update is a problem.&lt;/p&gt;
&lt;p&gt;The first few items in output are easy to understand, but the last one — SQL text — is a stroke of genius. In production environments, aside from hardware failures, database runtime status is strongly correlated with SQL. I personally believe you can unthinkingly capture SQL and discuss causality later. From an operations perspective, failures always require joint investigation with developers, so SQL text is basically mandatory to capture.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Evaluation
 &lt;div id="evaluation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#evaluation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Comparison of analysis report quality across different tools and approaches:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2f2ecc9b755b.png" alt="image-20251215082259815" /&gt;&lt;/p&gt;
&lt;p&gt;Impressive results. Notably, DBAIOps specifically emphasizes that mid-sized LLMs already produce good analysis results. This is important — DeepSeek-R1 671B running bare isn&amp;rsquo;t bad, but the cost is on a completely different level.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Nitpicking
 &lt;div id="nitpicking" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#nitpicking" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Can&amp;rsquo;t really be called &amp;ldquo;Ops&amp;rdquo; — it only has failure analysis functionality. Ops content is vast; failure analysis is just the tip of the iceberg.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Graph classification doesn&amp;rsquo;t match the graph example. The defined tag vertices and edges differ significantly from the example.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The vertices in the example play important roles, but these edge types aren&amp;rsquo;t defined: tag vertex-tool vertex, tag vertex-experience vertex, tag vertex-metric vertex. And the edges that should exist seem mostly absent, with only synonym edges present.&lt;/p&gt;
&lt;p&gt;Undescribed parts of the example should be listed, otherwise it&amp;rsquo;s confusing.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;The two-stage graph evolution results are a bit odd:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5fd6929e7dee.png" alt="image-20251214165952773" /&gt;&lt;/p&gt;
&lt;p&gt;w/o ADF means without Stage 2 graph evolution (adaptive abnormal metric detection).
w/o ADF should mean without Stage 1 graph evolution (graph inference and proximity discovery).
w/o ADF means without either stage of graph evolution.&lt;/p&gt;
&lt;p&gt;Here, the case with both stages of graph evolution is missing — having it would better demonstrate the effectiveness of two-stage graph evolution.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Root causes are somewhat limited:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4e7faebf0b1a.png" alt="image-20251214114018609" /&gt;&lt;/p&gt;
&lt;p&gt;The circled ones should be relatively common (I only looked at Oracle and Postgres), but these root causes are currently absent.&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s root causes are a bit sparse. Dirty page flushing generally isn&amp;rsquo;t a major issue — as a root cause, it probably ranks behind many other root causes.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Points I personally really like:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;GraphRAG should be better than vector RAG for failure diagnosis.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/eddb311c9614.png" alt="image-20251215212534234" /&gt;&lt;/p&gt;
&lt;p&gt;(GraphRAG original paper: &lt;a href="https://arxiv.org/pdf/2404.16130" target="_blank" rel="noreferrer"&gt;From Local to Global: A GraphRAG Approach to Query-Focused Summarization&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;SS represents vector RAG, TS represents source text summaries, and C0/C1/C2/C3 represent GraphRAG at different knowledge granularities. From this chart, we can simply conclude: GraphRAG is better suited for multi-document complex scenarios and multi-angle analysis, but may not necessarily outperform vector RAG in precision.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Semi-automatic graph generation approach.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Graph generation is semi-automatic — trigger vertices are manually created, others can be auto-generated. For example, LOG FILE SYNC is a trigger vertex. Failure entry points can indeed be made into obvious anomaly points — these are the entry points. Same for PG, same for any failure — it aligns with human logic for understanding failures.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Automatic graph evolution.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Strengthening associations between certain vertices is meaningful, as evident from the &amp;ldquo;Performance of DBAIOps Variants&amp;rdquo; table.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Automatic baseline adjustment.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In &lt;em&gt;Observability Engineering&lt;/em&gt;, there&amp;rsquo;s this passage about AIOps:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;AI can only help when there are clearly discernible patterns and it can identify shifting baselines for prediction — such AIOps doesn&amp;rsquo;t exist yet.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;DBAIOps in my eyes:&lt;/p&gt;
&lt;p&gt;Clearly discernible patterns = DBAIOps&amp;rsquo;s graph, which includes failure models, anomaly relationships, monitoring data, and logs.&lt;/p&gt;
&lt;p&gt;Shifting baselines = DBAIOps&amp;rsquo;s adaptive abnormal metric detection.&lt;/p&gt;
&lt;p&gt;In summary, it&amp;rsquo;s a significant advancement over random chunking of failure knowledge, setting a single baseline, and vector approximate search in RAG models.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2025/12/21/" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2025/12/21/&lt;/a&gt;论文精读dbaio-ps/&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>From collation mismatch Exception to Its Principles</title><link>https://lastdba.com/en/2025/12/13/from-collation-mismatch-exception-to-its-principles/</link><pubDate>Sat, 13 Dec 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/12/13/from-collation-mismatch-exception-to-its-principles/</guid><description>&lt;h2 class="relative group"&gt;Problem Phenomenon
 &lt;div id="problem-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After physical migration to Xinchuang, occasional errors appear in the pg log, version pg15:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: 01000: collation &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has version mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The collation in the database was created using version 2.17, but the operating system provides version 2.28.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild all objects affected by this collation and run ALTER COLLATION pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH VERSION, or build RaseSQL with the right library version.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: pg_newlocale_from_collation, pg_locale.c:1660&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Context: During the physical switch, invalid index rebuilding and refresh database collation version were performed.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Phenomenon
 &lt;div id="problem-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After physical migration to Xinchuang, occasional errors appear in the pg log, version pg15:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: 01000: collation &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has version mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The collation in the database was created using version 2.17, but the operating system provides version 2.28.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild all objects affected by this collation and run ALTER COLLATION pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH VERSION, or build RaseSQL with the right library version.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCATION: pg_newlocale_from_collation, pg_locale.c:1660&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Context: During the physical switch, invalid index rebuilding and refresh database collation version were performed.&lt;/p&gt;
&lt;p&gt;Although the libc version was upgraded after physical migration, indexes were rebuilt and are now valid, and the collation version in the database is already consistent with the OS libc.&lt;/p&gt;
&lt;p&gt;So,&lt;/p&gt;
&lt;p&gt;Why is the error reported?&lt;/p&gt;
&lt;p&gt;Where is the error triggered?&lt;/p&gt;
&lt;p&gt;What is the impact of the error?&lt;/p&gt;
&lt;p&gt;How to resolve it?&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Why is the error reported?
 &lt;div id="why-is-the-error-reported" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-is-the-error-reported" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The collation inside the database mainly involves 3 aspects: database, columns, and indexes. The first two use default collation, and the index collation is the real collation.&lt;/p&gt;
&lt;p&gt;First, check the database collation. All databases use en_US.UTF8, and refresh database collation has already been done, so the &amp;ldquo;collation &amp;quot;zh_CN.utf8&amp;quot; has version mismatch&amp;rdquo; error should not be thrown at the database layer.&lt;/p&gt;
&lt;p&gt;Then check columns without specially specified default collation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; attrelid,attname,attcollation &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; attcollation &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;950&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;951&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; attrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attcollation 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;0 means no collation, default oid=100, C oid=950, POSIX oid=951; &amp;ldquo;zh_CN.utf8&amp;rdquo; definitely won&amp;rsquo;t be any of these four.&lt;/p&gt;
&lt;p&gt;Finally, check indexes without specially specified collation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; indexrelid ,&lt;span style="color:#66d9ef"&gt;unnest&lt;/span&gt;(indcollation) coll &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index) i &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; coll &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;950&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;951&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; indexrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; coll 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Having ruled out database, columns, and indexes, only one situation remains: the application layer specifies a sort rule:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: &lt;span style="color:#ae81ff"&gt;01000&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; was created &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;, but the operating &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; provides &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; objects affected &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; this &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; run &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; build RaseSQL &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;right&lt;/span&gt; library &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: pg_newlocale_from_collation, pg_locale.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1660&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This zh_CN.utf8 version is inconsistent with the actual one:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; collname,collversion,pg_collation_actual_version(oid) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.utf8&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Not only zh_CN.utf8 is different, all are different (except a few collations without version concept).&lt;/p&gt;
&lt;p&gt;So it&amp;rsquo;s very likely that the application itself specified a sort rule &amp;ldquo;zh_CN.utf8&amp;rdquo;, but the coll version in the database is inconsistent with the OS, which triggered the error.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Source Code Understanding
 &lt;div id="source-code-understanding" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-understanding" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The error message makes it easy to locate the source code position. Two main functions are of interest: &lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; and &lt;code&gt;CheckMyDatabase&lt;/code&gt;.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; Caching and Checking &lt;code&gt;pg_collation&lt;/code&gt;
 &lt;div id="pg_newlocale_from_collation-caching-and-checking-pg_collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_newlocale_from_collation-caching-and-checking-pg_collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; was introduced in pg10.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Create a locale_t from a collation OID. Results are cached for the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * lifetime of the backend. Thus, do not free the result with freelocale().
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * As a special optimization, the default/database collation returns 0.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Callers should then revert to the non-locale_t-enabled code path.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In fact, they shouldn&amp;#39;t call this function at all when they are dealing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * with the default locale. That can save quite a bit in hotspots.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Also, callers should avoid calling this before going down a C/POSIX
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * fastpath, because such a fastpath should work even on platforms without
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * locale_t support in the C library.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * For simplicity, we always generate COLLATE + CTYPE even though we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * might only need one of them. Since this is called only once per session,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * it shouldn&amp;#39;t cost much.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* locale_t means non-ICU. This function caches a locale_t type collation OID for the backend
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;* the default/database collation returns 0. &amp;#34;default&amp;#34; means using the database&amp;#39;s collation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pg_newlocale_from_collation&lt;/span&gt;(Oid collid) &lt;span style="color:#75715e"&gt;// Note: passes in collation oid, not fetching all pg_collation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Return 0 for &amp;#34;default&amp;#34; collation, just in case caller forgets */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (collid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; DEFAULT_COLLATION_OID) &lt;span style="color:#75715e"&gt;// Three special collations:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; &lt;span style="color:#75715e"&gt;// default oid=100, C oid=950, POSIX oid=951
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (cache_entry&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;locale &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		collversion &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SysCacheGetAttr&lt;/span&gt;(COLLOID, tp, Anum_pg_collation_collversion,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;isnull); &lt;span style="color:#75715e"&gt;// Get version from pg_collation data dictionary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;isnull)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			actual_versionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;get_collation_actual_version&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collprovider, collcollate); &lt;span style="color:#75715e"&gt;// Get actual version via get_collation_actual_version
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			collversionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TextDatumGetCString&lt;/span&gt;(collversion);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(actual_versionstr, collversionstr) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// Compare data dictionary version and actual version, throw error if different
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(WARNING,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;collation &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; has version mismatch&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								&lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collname)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errdetail&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;The collation in the database was created using version %s, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#e6db74"&gt;&amp;#34;but the operating system provides version %s.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 collversionstr, actual_versionstr),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Rebuild all objects affected by this collation and run &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#e6db74"&gt;&amp;#34;ALTER COLLATION %s REFRESH VERSION, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#e6db74"&gt;&amp;#34;or build PostgreSQL with the right library version.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#a6e22e"&gt;quote_qualified_identifier&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;get_namespace_name&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collnamespace),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;															&lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(collform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;collname)))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; cache_entry&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;locale;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The main check is: through the coll oid, check whether the version in the pg_collation data dictionary is consistent with the actual version; if inconsistent, throw an error.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;code&gt;CheckMyDatabase&lt;/code&gt; Caching and Checking &lt;code&gt;pg_database&lt;/code&gt;
 &lt;div id="checkmydatabase-caching-and-checking-pg_database" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkmydatabase-caching-and-checking-pg_database" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;CheckMyDatabase&lt;/code&gt; has existed for a long time, performing many database-side checks. However, pg15 added logic for checking the database version.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * CheckMyDatabase -- fetch information from the pg_database entry for our DB
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckMyDatabase&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name, &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; am_superuser, &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; override_allow_connections)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Fetch our pg_database row normally, via syscache */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	tup &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SearchSysCache1&lt;/span&gt;(DATABASEOID, &lt;span style="color:#a6e22e"&gt;ObjectIdGetDatum&lt;/span&gt;(MyDatabaseId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	default_locale.provider &lt;span style="color:#f92672"&gt;=&lt;/span&gt; dbform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;datlocprovider; &lt;span style="color:#75715e"&gt;// default is the db&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Default locale is currently always deterministic. Nondeterministic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * locales currently don&amp;#39;t support pattern matching, which would break a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * lot of things if applied globally.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	default_locale.deterministic &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// byte-order sensitive
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Check collation version. See similar code in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * pg_newlocale_from_collation(). Note that here we warn instead of error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * in any case, so that we don&amp;#39;t prevent connecting.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	datum &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SysCacheGetAttr&lt;/span&gt;(DATABASEOID, tup, Anum_pg_database_datcollversion,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;isnull); &lt;span style="color:#75715e"&gt;// Get datcollversion from pg_database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;isnull)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;actual_versionstr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;collversionstr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		collversionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TextDatumGetCString&lt;/span&gt;(datum);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		actual_versionstr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;get_collation_actual_version&lt;/span&gt;(dbform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;datlocprovider, dbform&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;datlocprovider &lt;span style="color:#f92672"&gt;==&lt;/span&gt; COLLPROVIDER_ICU &lt;span style="color:#f92672"&gt;?&lt;/span&gt; iculocale : collate); &lt;span style="color:#75715e"&gt;// Get actual version via get_collation_actual_version
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(actual_versionstr, collversionstr) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// Compare db datcollversion and actual version, throw warning if not equal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(WARNING,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;database &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; has a collation version mismatch&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errdetail&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;The database was created using collation version %s, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#e6db74"&gt;&amp;#34;but the operating system provides version %s.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 collversionstr, actual_versionstr),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Rebuild all objects in this database that use the default collation and run &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#e6db74"&gt;&amp;#34;ALTER DATABASE %s REFRESH COLLATION VERSION, &amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#e6db74"&gt;&amp;#34;or build PostgreSQL with the right library version.&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#a6e22e"&gt;quote_identifier&lt;/span&gt;(name))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;CheckMyDatabase&lt;/code&gt; function compares the datcollversion in the pg_database data dictionary with the actual version.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Function Differences
 &lt;div id="function-differences" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#function-differences" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;In pg14 and before, there was only 1 collation comparison logic: when a session first caches the corresponding collation, it calls &lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; to access &lt;strong&gt;the version of the corresponding collation in the pg_collation data dictionary&lt;/strong&gt; and compare it with the real version.&lt;/li&gt;
&lt;li&gt;In PG15 and later, because the datcollversion field was added to the pg_database table, a new logic for checking db collation version was added: when a session first accesses the db in pg_database, it calls &lt;code&gt;CheckMyDatabase&lt;/code&gt; to check &lt;strong&gt;the datcollversion of the corresponding database in pg_database&lt;/strong&gt; and compare it with the real version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Why Are There Fewer Errors After Only Refreshing the Database?
 &lt;div id="why-are-there-fewer-errors-after-only-refreshing-the-database" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-are-there-fewer-errors-after-only-refreshing-the-database" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After refreshing the database collation version, the warning about inconsistent pg_database coll version won&amp;rsquo;t be triggered, but it still cannot rule out the situation where pg_collation&amp;rsquo;s coll version is inconsistent. Why are there so many fewer errors after only refreshing the database? Could it be that pg_collation&amp;rsquo;s coll version simply won&amp;rsquo;t be loaded?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.coll,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;unnest&lt;/span&gt;(indcollation) coll &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index ) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.coll;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; coll &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt; &lt;span style="color:#75715e"&gt;--C
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2841&lt;/span&gt; &lt;span style="color:#75715e"&gt;--No collation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;723&lt;/span&gt; &lt;span style="color:#75715e"&gt;--default&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In real environments, default is the most used. Generally, no one specifies a collation; if not specified it&amp;rsquo;s default, and default is the database&amp;rsquo;s default collation.&lt;/p&gt;
&lt;p&gt;Here we need to revisit the &lt;code&gt;pg_newlocale_from_collation&lt;/code&gt; function. The function starts like this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pg_newlocale_from_collation&lt;/span&gt;(Oid collid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	collation_cache_entry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cache_entry;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Callers must pass a valid OID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;OidIsValid&lt;/span&gt;(collid));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Return 0 for &amp;#34;default&amp;#34; collation, just in case caller forgets */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (collid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; DEFAULT_COLLATION_OID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;pg_locale_t&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;collid==DEFAULT_COLLATION_OID&lt;/code&gt;==100, it directly &lt;code&gt;return&lt;/code&gt;s without executing the real version check below, so it won&amp;rsquo;t throw a warning. This logic is reasonable because the db coll version has already been verified when logging into the database; if there&amp;rsquo;s a problem, a warning must have already been thrown at the session layer.&lt;/p&gt;
&lt;p&gt;Furthermore, even if a possible value like collid=37 is passed in, the corresponding C also has no version concept.&lt;/p&gt;
&lt;p&gt;Therefore, after refreshing the database, in the vast majority of scenarios, as long as the database&amp;rsquo;s internal sorting is used (not expression sorting or specified index sorting), no error will be thrown.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Testing
 &lt;div id="testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Here we only test whether there is a refresh warning, not testing index corruption or database crashes.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Check libc version&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;getconf GNU_LIBC_VERSION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Source host version glibc 2.17
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Target host glibc 2.28
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg version pg15+&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, only db coll version changes
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-only-db-coll-version-changes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-only-db-coll-version-changes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; datname,datlocprovider,datcollate,datctype,datcollversion &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_database 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datlocprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datctype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollversion 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------------+-------------+-------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; collname,collprovider,collversion,pg_collation_actual_version(oid) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;en_US.utf8&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; lzldb refresh &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NOTICE: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: changing &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: AlterDatabaseRefreshColl, dbcommands.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2399&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATABASE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Check pg_collation and pg_database again:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datlocprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datctype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datcollversion 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------------+-------------+-------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Consistent with the official documentation description: refresh database collation version only refreshes the db&amp;rsquo;s default collation; pg_collation itself won&amp;rsquo;t change.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, specifying expression sort reports warning
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-specifying-expression-sort-reports-warning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-specifying-expression-sort-reports-warning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;As analyzed at the beginning, expression sorting will report a warning, omitted.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, creating a new index with specified collation reports warning
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-creating-a-new-index-with-specified-collation-reports-warning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-creating-a-new-index-with-specified-collation-reports-warning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Test 1: Specify collation when creating index&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx11 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tt(a &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: &lt;span style="color:#ae81ff"&gt;01000&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; was created &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;, but the operating &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; provides &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; objects affected &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; this &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; run &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; build PostgreSQL &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;right&lt;/span&gt; library &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: pg_newlocale_from_collation, pg_locale.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Test 2: Specify column default collation when creating table, don&amp;rsquo;t specify when creating index&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb &lt;span style="color:#75715e"&gt;-- Reconnect a session
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; ttt(a varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxttt &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; ttt(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNING: &lt;span style="color:#ae81ff"&gt;01000&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; mismatch
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; was created &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;, but the operating &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; provides &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Rebuild &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; objects affected &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; this &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; run &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; pg_catalog.&lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt; REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; build PostgreSQL &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;right&lt;/span&gt; library &lt;span style="color:#66d9ef"&gt;version&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: pg_newlocale_from_collation, pg_locale.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;904&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Column default collation and index specification of collation are essentially the same thing, both for specifying the index&amp;rsquo;s collation. Both can report warnings.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Test: Refresh db without refreshing pg_collation, existing index with specified collation does not report warning
 &lt;div id="test-refresh-db-without-refreshing-pg_collation-existing-index-with-specified-collation-does-not-report-warning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-refresh-db-without-refreshing-pg_collation-existing-index-with-specified-collation-does-not-report-warning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Scenario: The original database already has an index specifying collation zh_CN.utf8, different from the db. Refreshing the db won&amp;rsquo;t catch it. But after migrating to a new database, the vendor&amp;rsquo;s coll version definitely changed.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; collname,collprovider,collversion,pg_collation_actual_version(oid) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.utf8&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collprovider &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collversion &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_collation_actual_version 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------------+-----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without using expression sorting, the index can be used, but index sorting cannot be used:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_seqscan &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANALYZE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6667&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6670&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;928&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;145&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6667&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6892&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;81&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90004&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;926&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;021&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Method&lt;/span&gt;: top&lt;span style="color:#f92672"&gt;-&lt;/span&gt;N heapsort Memory: &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxtt &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tt (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1732&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90004&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;029&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;434&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90004&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Fetches: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Existing indexes with specified collation do not report warnings when used.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary of This Problem
 &lt;div id="summary-of-this-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary-of-this-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The refresh database and refresh collation warnings are session-level. In each session, for each database or each collation, it only reports once.&lt;/p&gt;
&lt;p&gt;Only refreshing the database very likely won&amp;rsquo;t report warnings again, but there are situations where creating an index with a specified collation or running SQL with specified expression collation may still report warnings.&lt;/p&gt;
&lt;p&gt;The coll version in the data dictionary is only for tracking whether the collation provider version has changed at the database layer. Imagine if there were no coll version in the data dictionary - the database might not even be able to return a warning saying &amp;ldquo;your sort rule provider has upgraded its version, your data sorting might have problems, you need to check it&amp;rdquo; (and of course it&amp;rsquo;s not just about sorting).&lt;/p&gt;

&lt;h2 class="relative group"&gt;Solutions for This Problem
 &lt;div id="solutions-for-this-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solutions-for-this-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Corrupt indexes have already been rebuilt, the database has been refreshed, only collation hasn&amp;rsquo;t been refreshed. The inconsistency of coll version in the data dictionary is not a big problem, it&amp;rsquo;s just a warning. As for other hidden and strange pitfalls, refer to the more section.&lt;/p&gt;
&lt;p&gt;Solution for this problem:&lt;/p&gt;
&lt;p&gt;Step 1: Check if there are still dependencies&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_describe_object(refclassid, refobjid, refobjsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Collation&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_describe_object(classid, objid, objsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Object&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_depend d &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; refclassid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;pg_collation&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; refobjid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collversion &lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt; pg_collation_actual_version(&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If there are returns, it&amp;rsquo;s best to rebuild the dependent objects; if not, follow step 2:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Solution 1: Do nothing. If there aren&amp;rsquo;t many warnings, leaving them alone is fine.&lt;/li&gt;
&lt;li&gt;Solution 2: Only refresh collation zh_CN.UTF8. Fix one as it comes.&lt;/li&gt;
&lt;li&gt;Solution 3: Refresh all collations. Even if the application incrementally uses expressions or index-specified collation, no warnings will be reported.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;More
 &lt;div id="more" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#more" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Key Summary of glibc Upgrade Related Issues
 &lt;div id="key-summary-of-glibc-upgrade-related-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#key-summary-of-glibc-upgrade-related-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Locale is a very tricky area, and glibc upgrades cause many collation-related problems. Referencing reference materials, here&amp;rsquo;s a summary of some important points:&lt;/p&gt;
&lt;p&gt;pg_collation is obtained from the OS command &lt;code&gt;locale -a&lt;/code&gt;; the provider is basically glibc, so you need to look at the glibc version.&lt;/p&gt;
&lt;p&gt;In pg_collation, &amp;ldquo;C&amp;rdquo; and &amp;ldquo;posix&amp;rdquo; have collprovider &lt;code&gt;c&lt;/code&gt;, which looks the same as &amp;ldquo;C.UTF8&amp;rdquo; etc., but they&amp;rsquo;re not. &amp;ldquo;C.UTF8&amp;rdquo;&amp;rsquo;s provider is glibc, &lt;strong&gt;has a version, generally Unicode codepoint sorting or Unicode semantic sorting&lt;/strong&gt;; &amp;ldquo;C&amp;rdquo; and &amp;ldquo;POSIX&amp;rdquo; are equivalent, the most basic locale defined by the POSIX standard, implemented by libc, not in &lt;code&gt;locale -a&lt;/code&gt;, &lt;strong&gt;has no version, sorts directly by byte order&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Root cause of collation problems: The database requires that locale definitions never change during the database lifecycle, but OS vendors, especially the GNU C library, make changes to locale in every minor version, and this is legitimate.&lt;/p&gt;
&lt;p&gt;GNU C library makes changes to locale in every minor version. The version most prone to problems in reality is &lt;strong&gt;glibc 2.28&lt;/strong&gt;, because 2.28 upgraded the major version &lt;strong&gt;unicode 9.0.0&lt;/strong&gt; (&lt;a href="https://sourceware.org/glibc/wiki/Release/2.28" target="_blank" rel="noreferrer"&gt;has been updated to a new upstream version from ISO which is in sync with Unicode 9.0.0&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg has no way to detect compatibility issues caused by glibc upgrades&lt;/strong&gt;. Index corruption checking is not an all-check, and indexes are only one aspect. After physical replication or upgrade, even if indexes are rebuilt, you cannot rule out the possibility that the database crashes one day due to collation version issues.&lt;/p&gt;
&lt;p&gt;Data anomalies include: duplicate primary keys, sort-dependent constraints, range partition table data written to wrong partitions, mergejoin and other sort operations, etc.&lt;/p&gt;
&lt;p&gt;Character types depend on collation. Data types that don&amp;rsquo;t depend on collation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;bytea&lt;/li&gt;
&lt;li&gt;tsvector gin indexes&lt;/li&gt;
&lt;li&gt;pg_trgm indexes&lt;/li&gt;
&lt;li&gt;numeric data types: int, bigint, numeric, float, &amp;hellip;&lt;/li&gt;
&lt;li&gt;custom data types like geometry (PostGIS)&lt;/li&gt;
&lt;li&gt;timestamp&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;ASCII sorting is relatively common but doesn&amp;rsquo;t conform to human understanding, i.e., not semantic. Semantically conforming international sorting standards are generally Unicode standards.&lt;/p&gt;
&lt;p&gt;Unicode-based sorting rules are divided into 2 types: codepoint sorting, UCA (Unicode Collation Algorithm).&lt;/p&gt;
&lt;p&gt;UCA is based on DUCET (Default Unicode Collation Element Table). The DUCET table itself may have sorting changes between different versions. For example, en_US.UTF8 is UCA sorting, equivalent to semantic sorting; version upgrades will change sorting rules. C.UTF8 is codepoint sorting; once codepoints are confirmed they won&amp;rsquo;t change, and sorting rules won&amp;rsquo;t change.&lt;/p&gt;
&lt;p&gt;PG 17+ provides a very safe locale provider method: builtin, no longer depending on OS-provided glibc, ICU and other providers. Example enable command:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb &lt;span style="color:#75715e"&gt;--locale-provider=builtin --bultin-locale=C.UTF-8 dbname1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;17 only supports C, C.UTF-8. C is byte-order sorting (approximately ASCII sorting), C.UTF-8 is Unicode codepoint sorting; 18 adds one more PG_UNICODE_FAST, also Unicode codepoint sorting, with &lt;a href="https://www.postgresql.org/docs/18/locale.html#LOCALE-PROVIDERS" target="_blank" rel="noreferrer"&gt;slight differences&lt;/a&gt; from C.UTF-8.&lt;/p&gt;
&lt;p&gt;Because the database must maintain stable sorting, custom application sorting can only be pushed to the application layer. For example, expression sorting is semantically clear and doesn&amp;rsquo;t affect the database&amp;rsquo;s own choice of collation. If one day pg also supports built-in en_US.utf8, then we can consider built-in semantic sorting.&lt;/p&gt;
&lt;p&gt;During Xinchuang migration, the glibc version of Xinchuang hosts is generally higher than old Intel server glibc versions, likely crossing the 2.28 version. Combined with tight deadlines, KPI pressure, insufficient manpower, and large databases, physical migration is unavoidable. So Xinchuang physical migration needs to pay attention to glibc versions and many anomalies caused by collation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What to Do After Physical Migration
 &lt;div id="what-to-do-after-physical-migration" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-to-do-after-physical-migration" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Assuming the database is en_US.utf8, provider c, and physical migration across libc versions has already been done, the following operations should be performed:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;I. Official Required Solution&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;At minimum, rebuild problematic indexes. Install the amcheck extension and use the bt_index_check function:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; bt_index_check(&lt;span style="color:#e6db74"&gt;&amp;#39;idx1&amp;#39;&lt;/span&gt;::regclass, &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Refresh database version (pg15+):&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATABASE&lt;/span&gt; name REFRESH &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Check if there are other &lt;a href="https://www.postgresql.org/docs/18/sql-altercollation.html#SQL-ALTERCOLLATION-NOTES" target="_blank" rel="noreferrer"&gt;dependent objects&lt;/a&gt;. If there are, handle them accordingly:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_describe_object(refclassid, refobjid, refobjsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Collation&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_describe_object(classid, objid, objsubid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;Object&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_depend d &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; refclassid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;pg_collation&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; refobjid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collversion &lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt; pg_collation_actual_version(&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After handling, then:&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Refresh collation version (pg10+):&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATION&lt;/span&gt; name REFRESH &lt;span style="color:#66d9ef"&gt;VERSION&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;II. Unofficial Workaround Solutions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I haven&amp;rsquo;t made a complete solution here, just some thoughts.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Handling partition table data written to wrong partition:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Partition key is int/bigint/float, no relation to collation, can be ignored.&lt;/p&gt;
&lt;p&gt;Partition key is time partition, if timestamp, can be ignored. If varchar or other character types, depends on the situation.&lt;/p&gt;
&lt;p&gt;Partition key is character type, refer to &amp;ldquo;a&amp;rdquo; and &amp;ldquo;-&amp;rdquo; sorting (pgconf Collation Challenges Sorting It Out). But note the following points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If querying data, don&amp;rsquo;t query from the parent table; it might crash or fail to return results.&lt;/li&gt;
&lt;li&gt;There&amp;rsquo;s no simple detection solution.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;
&lt;p&gt;Handling primary key/unique key conflicts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Handling fdw sort range anomaly issues.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unknown problems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;ref
 &lt;div id="ref" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ref" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Locale_data_changes" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Locale_data_changes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Collations" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Collations&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;pgconf Collation Challenges Sorting It Out&lt;/p&gt;
&lt;p&gt;PFCONF Collations from A to Z&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.unicode.org/reports/tr10/tr10-34.html" target="_blank" rel="noreferrer"&gt;http://www.unicode.org/reports/tr10/tr10-34.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://sourceware.org/glibc/wiki/Release/2.28" target="_blank" rel="noreferrer"&gt;https://sourceware.org/glibc/wiki/Release/2.28&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/18/sql-altercollation.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/18/sql-altercollation.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/18/sql-alterdatabase.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/18/sql-alterdatabase.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/17/locale.html#LOCALE-PROVIDERS" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/17/locale.html#LOCALE-PROVIDERS&lt;/a&gt;&lt;/p&gt;
&lt;hr&gt;</content:encoded></item><item><title>A Brief Review of Logical Replication in Oracle, MySQL, and PostgreSQL</title><link>https://lastdba.com/en/2025/11/30/a-brief-review-of-logical-replication-in-oracle-mysql-and-postgresql/</link><pubDate>Sun, 30 Nov 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/11/30/a-brief-review-of-logical-replication-in-oracle-mysql-and-postgresql/</guid><description>&lt;h3 class="relative group"&gt;PostgreSQL Logical Replication
 &lt;div id="postgresql-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;​​​​


&lt;img src="https://lastdba.com/img/csdn/64e1d30f2123.png" alt="在这里插入图片描述" /&gt;
（https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf）&lt;/p&gt;
&lt;p&gt;PostgreSQL places all logical decoding related matters entirely within the database&amp;rsquo;s replication slots for management — an all-inclusive approach. Early versions had somewhat limited logical replication support, but in recent major versions, logical replication has been one of the primary functional improvements.&lt;/p&gt;
&lt;p&gt;Advantages of the PG approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very flexible: it exposes the logical decoding interface to users, with multiple types of decoding methods available.&lt;/li&gt;
&lt;li&gt;Users can subscribe to only the data they need based on their requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the PG approach:&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;PostgreSQL Logical Replication
 &lt;div id="postgresql-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;​​​​


&lt;img src="https://lastdba.com/img/csdn/64e1d30f2123.png" alt="在这里插入图片描述" /&gt;
（https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf）&lt;/p&gt;
&lt;p&gt;PostgreSQL places all logical decoding related matters entirely within the database&amp;rsquo;s replication slots for management — an all-inclusive approach. Early versions had somewhat limited logical replication support, but in recent major versions, logical replication has been one of the primary functional improvements.&lt;/p&gt;
&lt;p&gt;Advantages of the PG approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very flexible: it exposes the logical decoding interface to users, with multiple types of decoding methods available.&lt;/li&gt;
&lt;li&gt;Users can subscribe to only the data they need based on their requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the PG approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The number of concepts to learn and the learning cost are relatively higher compared to MySQL. Just the basic concepts — publication, subscription, walsender, replication slots, output plugins, etc. — I believe many people haven&amp;rsquo;t fully grasped their definitions and relationships.&lt;/li&gt;
&lt;li&gt;Does the hardest work and takes the hardest hits. All logical decoding problems are exposed within the database: WAL backlog, large transactions, long transactions, reorder transaction sorting, privilege issues, streaming transmission — these are all problems PG has to deal with.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;MySQL&amp;rsquo;s binlog
 &lt;div id="mysqls-binlog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mysqls-binlog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/668c1dc8ce20.png" alt="在这里插入图片描述" /&gt;
(&lt;a href="https://blog.fasterinfo.top/6243.html" target="_blank" rel="noreferrer"&gt;https://blog.fasterinfo.top/6243.html&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;MySQL places all decoded logical data locally — in binlog files. The approach is simple. &lt;em&gt;MySQL&amp;rsquo;s binlog is roughly equivalent to PostgreSQL with full-table logical replication enabled and written locally.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Advantages of the MySQL approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Simple and straightforward: MySQL doesn&amp;rsquo;t expose the logical decoding interface directly to users. Instead, it provides already-decoded files directly to users, who don&amp;rsquo;t need to care about how parsing works — just read the binlog files.&lt;/li&gt;
&lt;li&gt;Mature ecosystem. I personally believe MySQL&amp;rsquo;s mature ecosystem is closely tied to binlog. During the internet era, PG&amp;rsquo;s logical replication was still weak, while binlog was extremely simple. Downstream parsing of binlog to put data onto other platforms became a common pattern.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the MySQL approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All data must be decoded; no customizable subscription. Poor flexibility.&lt;/li&gt;
&lt;li&gt;Two-phase commit. Because MySQL&amp;rsquo;s primary-standby replication heavily depends on binlog, binlog data must be fully flushed to binlog files at commit time. A single commit must write two (or two kinds of) logs — binlog and redolog. Dual log writes are one of MySQL&amp;rsquo;s eternal pain points.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Oracle Logical Replication
 &lt;div id="oracle-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oracle-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8978c46a1452.png" alt="在这里插入图片描述" /&gt;
（https://www.oracle-scn.com/oracle-goldengate-integrated-capture/）&lt;/p&gt;
&lt;p&gt;Oracle itself does have logical Data Guard functionality, but virtually no one uses it. Here we&amp;rsquo;ll only discuss LogMiner. The Oracle database itself provides an interface like LogMiner for parsing logs (e.g., OGG integrated capture mode), but has zero replication link management itself — it relies on third-party tools to create and manage replication links.&lt;/p&gt;
&lt;p&gt;Advantages of the Oracle approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Only provides a parsing interface, no replication link management. For the database itself, this is very hassle-free.&lt;/li&gt;
&lt;li&gt;Pay and you get a solution. Just buy the powerful OGG directly. Don&amp;rsquo;t say Oracle hasn&amp;rsquo;t provided a logical replication solution — we not only have one, it&amp;rsquo;s powerful and highly recognized.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages of the Oracle approach:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Relies on third-party software to manage replication links.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In summary, PG&amp;rsquo;s logical replication is an all-in-one, do-everything approach — very much in the open-source, technical spirit. MySQL&amp;rsquo;s approach is simple, crude, but effective — somewhat &amp;ldquo;one-step-to-finish.&amp;rdquo; Oracle&amp;rsquo;s approach is: provide an interface and leave everything else to third parties, but from the customer&amp;rsquo;s perspective, there is a mature solution available.&lt;/p&gt;</content:encoded></item><item><title>CXL and PolarDB-CXL</title><link>https://lastdba.com/en/2025/11/30/cxl-and-polardb-cxl/</link><pubDate>Sun, 30 Nov 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/11/30/cxl-and-polardb-cxl/</guid><description>&lt;p&gt;Paper: Unlocking the Potential of CXL for Disaggregated Memory in Cloud-Native Databases&lt;/p&gt;
&lt;p&gt;SIGMOD best paper: &lt;a href="https://sigmod.org/sigmod-awards/sigmod-best-paper-award/" target="_blank" rel="noreferrer"&gt;https://sigmod.org/sigmod-awards/sigmod-best-paper-award/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;CXL and PolarDB-CXL
 &lt;div id="cxl-and-polardb-cxl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cxl-and-polardb-cxl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What is CXL
 &lt;div id="what-is-cxl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-cxl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;CXL&lt;/strong&gt;: An open industry standard, a high-speed interconnect specification formulated by the CXL Consortium (founded in 2019 by tech giants Intel, AMD, ARM, etc.). It represents the evolutionary direction of computing architecture. Currently at CXL 4.0.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Paper: Unlocking the Potential of CXL for Disaggregated Memory in Cloud-Native Databases&lt;/p&gt;
&lt;p&gt;SIGMOD best paper: &lt;a href="https://sigmod.org/sigmod-awards/sigmod-best-paper-award/" target="_blank" rel="noreferrer"&gt;https://sigmod.org/sigmod-awards/sigmod-best-paper-award/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;CXL and PolarDB-CXL
 &lt;div id="cxl-and-polardb-cxl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cxl-and-polardb-cxl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What is CXL
 &lt;div id="what-is-cxl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-cxl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;CXL&lt;/strong&gt;: An open industry standard, a high-speed interconnect specification formulated by the CXL Consortium (founded in 2019 by tech giants Intel, AMD, ARM, etc.). It represents the evolutionary direction of computing architecture. Currently at CXL 4.0.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Feature&lt;/th&gt;
 &lt;th&gt;CXL 1.0/1.1&lt;/th&gt;
 &lt;th&gt;CXL 2.0&lt;/th&gt;
 &lt;th&gt;CXL 3.0/3.1&lt;/th&gt;
 &lt;th&gt;CXL 4.0 (latest)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Release&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;March/Sept 2019&lt;/td&gt;
 &lt;td&gt;October 2020&lt;/td&gt;
 &lt;td&gt;August 2022 / November 2023&lt;/td&gt;
 &lt;td&gt;November 2025&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Base Protocol&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;PCIe 5.0 (32 GT/s)&lt;/td&gt;
 &lt;td&gt;PCIe 5.0 (32 GT/s)&lt;/td&gt;
 &lt;td&gt;PCIe 6.0 (64 GT/s)&lt;/td&gt;
 &lt;td&gt;PCIe 7.0 (128 GT/s)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Max Bandwidth&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;1TB/s&lt;/td&gt;
 &lt;td&gt;1TB/s&lt;/td&gt;
 &lt;td&gt;2TB/s&lt;/td&gt;
 &lt;td&gt;4TB/s+&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Topology Scale&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Point-to-point / simple star&lt;/td&gt;
 &lt;td&gt;Single switch (≤32 nodes)&lt;/td&gt;
 &lt;td&gt;Multi-level Fabric (4096 nodes)&lt;/td&gt;
 &lt;td&gt;Ultra-large-scale Fabric&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;From my research, two descriptions of CXL left the deepest impression:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Memory as a Service&lt;/li&gt;
&lt;li&gt;Near-memory computing and expansion&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;CXL switch&lt;/strong&gt;: A switching chip, physical hardware. Many vendors are working on industrial implementations. The paper specifically references products from XConn Tech: &lt;a href="https://www.xconn-tech.com/products" target="_blank" rel="noreferrer"&gt;CXL 2.0 switch&lt;/a&gt;. Note that as of November 22, 2025, XConn only has CXL 2.0 switches, no 3.0 products. However, there are products on the market supporting 3.0+ standards, such as &lt;a href="https://panmnesia.com/news/en/2025-11-13-switch-sample/" target="_blank" rel="noreferrer"&gt;Panmnesia CXL 3.2 Fabric Switch&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PolarCXLMem&lt;/strong&gt;: According to the paper, &amp;ldquo;the first CXL-switch-based disaggregated memory system.&amp;rdquo; But the paper also states &amp;ldquo;we leverage the world&amp;rsquo;s first CXL switch[50]&amp;rdquo; — specifically referring to the XConn tech CXL 2.0 switch — and then says &amp;ldquo;PolarCXLMem is the first CXL-switch-based disaggregated memory.&amp;rdquo; This can be interpreted in two ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first disaggregated memory system based on CXL switches&lt;/li&gt;
&lt;li&gt;The first disaggregated memory system based on XConn tech CXL 2.0 switches&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PolarDB-CXL&lt;/strong&gt;: The paper doesn&amp;rsquo;t actually use this term, but the industry uses it. It represents &amp;ldquo;integrate &lt;em&gt;PolarCXLMem&lt;/em&gt; into the multi-primary version of PolarDB, known as PolarDB-MP&amp;rdquo; — essentially &amp;ldquo;&lt;strong&gt;the CXL-upgraded version of PolarDB-MP&lt;/strong&gt;.&amp;rdquo; The paper repeatedly uses lengthy phrases but never uses the term polardb-cxl. For convenience, this article uses polardb-cxl to represent its essential meaning.&lt;/p&gt;

&lt;h3 class="relative group"&gt;RDMA vs CXL
 &lt;div id="rdma-vs-cxl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rdma-vs-cxl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PolarDB-MP uses RDMA architecture, while PolarDB-CXL uses CXL architecture:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fc236ee67755.png" alt="image-20251122115316339" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://medium.com/@anan.mirji/cxl-switch-vs-rdma-a-technical-comparison-for-high-performance-interconnects-6aaa031cde31" target="_blank" rel="noreferrer"&gt;https://medium.com/@anan.mirji/cxl-switch-vs-rdma-a-technical-comparison-for-high-performance-interconnects-6aaa031cde31&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;RDMA architecture is a cross-host distributed interconnect architecture, while CXL architecture is a single-host expanded interconnect architecture.&lt;/p&gt;
&lt;p&gt;Key differences:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Dimension&lt;/th&gt;
 &lt;th&gt;RDMA Architecture&lt;/th&gt;
 &lt;th&gt;CXL Architecture&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Topology&lt;/td&gt;
 &lt;td&gt;Multi-host + network switch distributed arch&lt;/td&gt;
 &lt;td&gt;Single-host + CXL switch expanded arch&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Communication&lt;/td&gt;
 &lt;td&gt;Network (InfiniBand/RoCE)&lt;/td&gt;
 &lt;td&gt;PCIe bus (CXL based on PCIe physical layer)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Core Components&lt;/td&gt;
 &lt;td&gt;RDMA NIC (dedicated NIC)&lt;/td&gt;
 &lt;td&gt;CXL Controller, CXL Switch&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Resource Ownership&lt;/td&gt;
 &lt;td&gt;&amp;ldquo;Remote resources&amp;rdquo; across independent hosts&lt;/td&gt;
 &lt;td&gt;&amp;ldquo;Expanded resources&amp;rdquo; within the host architecture&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;CXL&amp;rsquo;s Advantages
 &lt;div id="cxls-advantages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cxls-advantages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;CXL&amp;rsquo;s advantages over RDMA:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Low latency: CXL connects to host or device memory via PCIe; RDMA requires protocol interface conversion between InfiniBand and PCIe.&lt;/p&gt;
&lt;p&gt;Instruction support: CXL provides native load/store instructions, allowing the CPU to directly manipulate remote CXL device memory as if it were local memory. RDMA requires reading from remote memory to local memory, processing locally, then writing back to remote memory.&lt;/p&gt;
&lt;p&gt;Simplified applications: RDMA requires special interfaces and drivers, needing professionals to design complex programs; CXL provides transparent memory space, greatly simplifying application design.&lt;/p&gt;
&lt;p&gt;Memory fusion: CXL 3.0 supports physical hardware-level memory pooling.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Problems with PolarDB-MP and the value CXL provides:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;CXL&amp;rsquo;s critique of MP:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Memory pages are 4-16K, so even when only a small amount of data transfer is needed, data must move between local and shared memory, causing read/write amplification.&lt;/li&gt;
&lt;li&gt;Maintaining local memory adds extra memory overhead, reducing throughput.&lt;/li&gt;
&lt;li&gt;Recovery is very time-consuming.&lt;/li&gt;
&lt;li&gt;RDMA is far better than TCP/IP, but under high concurrency, it suffers from &amp;ldquo;doorbell register implicit contention&amp;rdquo; and &amp;ldquo;cache thrashing&amp;rdquo; issues.&lt;/li&gt;
&lt;li&gt;The database itself must maintain shared memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Benefits CXL brings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Eliminates the &amp;ldquo;shared memory - local memory&amp;rdquo; hierarchical memory structure, also eliminating the maintenance overhead and read/write amplification. Because CXL load/store to local memory is fast enough, it allows directly storing all buffer pages.&lt;/li&gt;
&lt;li&gt;Uses cache lines (64B) as the minimum transfer unit between CPU cache and main memory, rather than PolarDB-MP&amp;rsquo;s 4K pages.&lt;/li&gt;
&lt;li&gt;Saves main memory. DRAM costs are very high, roughly 40-50% of server/rack costs.&lt;/li&gt;
&lt;li&gt;Simplifies system design. Minimal modifications to existing systems are important for commercial database stability.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PolarRecv&lt;/em&gt;: An instant recovery system built on CXL. After a database crash, data and metadata remain on CXL, allowing direct reads of consistent state from CXL memory, so recovery is very fast. (This seems similar to how PG&amp;rsquo;s page cache helps fast startup after a crash.)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;DRAM vs RDMA vs CXL&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/66af810eb94e.png" alt="image-20251122155133782" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ae9cc24e5158.png" alt="image-20251122155109014" /&gt;&lt;/p&gt;
&lt;p&gt;When data volume is small, RDMA has significantly higher latency than CXL; with larger data, RDMA&amp;rsquo;s latency improves slightly. Local DRAM access is slightly better than CXL access.&lt;/p&gt;
&lt;p&gt;Overall, CXL memory access latency is slightly higher than DRAM but better than RDMA.&lt;/p&gt;
&lt;p&gt;Regarding CXL&amp;rsquo;s higher latency vs DRAM, the paper explains: &amp;ldquo;database buffer pool operations are more sensitive to bandwidth than latency&amp;rdquo; — for database memory, bandwidth matters more than latency.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Custom Rack
 &lt;div id="custom-rack" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#custom-rack" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Self-developed physical prototype rack. The left rack integrates two CXL switch-enabled clusters, each connected to memory devices and hosts; the right rack integrates one CXL switch connected to memory devices and hosts.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/45333b6bf088.png" alt="image-20251122151718276" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;PolarCXLMem
 &lt;div id="polarcxlmem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#polarcxlmem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The CXL 2.0 switch supports memory pooling, but the drivers don&amp;rsquo;t fully support it, so PolarCXLMem still designed its own CXL memory allocation and usage — it&amp;rsquo;s not fully transparent. PolarCXLMem processes CXL memory into a multi-tenant model, with different host nodes allocated different CXL memory regions.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ddc3509d74d0.png" alt="image-20251123094443287" /&gt;&lt;/p&gt;
&lt;p&gt;PolarCXLMem characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Nodes have their own CXL memory regions; different nodes&amp;rsquo; CXL memory does not overlap.&lt;/li&gt;
&lt;li&gt;The buffer pool is allocated at database startup (by the CXL mem manager in the diagram) and does not change during runtime.&lt;/li&gt;
&lt;li&gt;The memory unit structure in CXL mem is a block, which stores page data and page metadata, including: id (page id), lock state (whether the page is locked for update), prev/next (LRU doubly-linked list), lsn (latest log sequence number of the page).&lt;/li&gt;
&lt;li&gt;Free list / in-use list is used for LRU.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Question: PG&amp;rsquo;s page header has lsn, starting free space pointer, prune xid, etc. What does PolarDB-CXL&amp;rsquo;s page header structure look like?&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;PolarRecv
 &lt;div id="polarrecv" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#polarrecv" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PolarDB-MP was designed based on RDMA, where data pages are written locally, and the disaggregated shared memory doesn&amp;rsquo;t contain the latest version of data pages. This means after a host crash, you must scan and apply all redo log files (the paper says redo, not WAL) or pages from a small amount of shared memory.&lt;/p&gt;
&lt;p&gt;CXL switches have independent power, so even if the host crashes, the latest data remains in CXL memory. PolarRecv leverages this to dramatically speed up database recovery after host crashes.&lt;/p&gt;
&lt;p&gt;However, while CXL switch memory is transparent and persistent, directly using it after a crash still requires handling these issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LRU lists may be inconsistent at crash time&lt;/li&gt;
&lt;li&gt;B-tree SMO (B-tree structure changes), such as index splits, may be inconsistent at crash time&lt;/li&gt;
&lt;li&gt;Pages being updated at crash time may be inconsistent&lt;/li&gt;
&lt;li&gt;The redo log buffer uses local DRAM. When the redo log hasn&amp;rsquo;t been flushed to disk at crash time, the page LSN in the CXL buffer pool may be greater than the LSN in the redo log file, directly violating the ARIES principle&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PolarRecv&amp;rsquo;s design strategies:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use mutex to protect the LRU structure. The mutex lock state indicates whether LRU was being modified at crash time. If so, LRU must be rebuilt; if not, use the LRU directly from CXL memory.&lt;/li&gt;
&lt;li&gt;During B-tree SMO, a mini-transaction protects index pages. This mini-transaction is a two-phase lock corresponding to page locks. It&amp;rsquo;s only flushed to the redo log when the mini-transaction commits. So during recovery, if an index page is found with a write lock, recover from the redo logs.&lt;/li&gt;
&lt;li&gt;PolarCXL&amp;rsquo;s read/write locks are stored in CXL memory. If a write lock still exists, it means the update was in an intermediate state at crash time and not completed. In this case, honestly read the page from the redo log file rather than reading an inconsistent page from CXL memory.&lt;/li&gt;
&lt;li&gt;During recovery, first obtain the maximum LSN from the redo log, then check the lock and LSN of pages in CXL memory. If a page&amp;rsquo;s LSN in CXL memory is greater than the max LSN, rebuild the page using redo log information rather than using the CXL memory version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Memory Fusion
 &lt;div id="memory-fusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-fusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Because PolarCXLMem is designed based on the CXL 2.0 switch, and CXL 3.0 supports memory fusion, memory fusion design is still needed. Since each node&amp;rsquo;s buffer pool is placed in isolation in PolarCXLMem, &lt;strong&gt;CXL 2.0&amp;rsquo;s memory fusion is achieved through DBP metadata management — each buffer pool only stores the page&amp;rsquo;s CXL memory address, not the page itself.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d3b28a223927.png" alt="image-20251123142605871" /&gt;&lt;/p&gt;
&lt;p&gt;To understand this diagram, you need to distinguish between CXL memory, DBP, and local buffer:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CXL memory is the physical hardware, CXL mem itself.&lt;/li&gt;
&lt;li&gt;DBP is a region carved out of CXL for managing memory fusion services.&lt;/li&gt;
&lt;li&gt;Local metadata buffer contains local buffer metadata and part of CXL.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Also understand that for each page in the buffer pool, there are two flags:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;invalid: After another node writes to the page, the current node needs to invalidate its local CPU cache.&lt;/li&gt;
&lt;li&gt;removal: When a page moves from the in-use list to the free list, all nodes must set the removal flag.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Memory fusion page access flow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The requested page is not in the local page metadata buffer:
1.1 Allocate a new meta record from the free list, and provide invalid and removal addresses to the memory fusion service via RPC.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The requested page is in the local page metadata buffer:
2.1 First check the removal flag. If removal is set, it means the memory fusion service has already reclaimed the page, and a new memory address must be requested from the memory fusion service via RPC.
2.2 Then check the invalid flag. If invalid is set, it means the page has been modified by another node, and the CPU cache must be invalidated to ensure consistency.&lt;/p&gt;
&lt;p&gt;Fusion consistency:&lt;/p&gt;
&lt;p&gt;Since CXL 2.0 doesn&amp;rsquo;t have memory fusion, CPU caches aren&amp;rsquo;t automatically updated. PolarCXL implements multi-node concurrent write control through page-level locks.&lt;/p&gt;
&lt;p&gt;Nodes must acquire read/write locks to read/write pages. &lt;strong&gt;When one node is writing to a page, other nodes cannot read or write that page.&lt;/strong&gt; After a node finishes writing, it must also:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flush the CPU cache to CXL mem (cache line flush) to ensure CXL mem has the latest page version.&lt;/li&gt;
&lt;li&gt;Set the invalid flag to ensure other nodes don&amp;rsquo;t read stale page versions from their CPU caches.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Memory fusion summary:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CXL 2.0 itself supports incomplete memory fusion, meaning the database layer still needs to design a memory fusion scheme. Memory pages are accessed via CXL addresses, rather than local/remote access to entire pages as in the RDMA approach. The local CPU cache needs the database layer to flush it to ensure node data access consistency — this is a hard limitation. This also means cross-node updates still use exclusive page-level locks (the RDMA approach also uses exclusive page-level locks).&lt;/strong&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Performance Evaluation
 &lt;div id="performance-evaluation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#performance-evaluation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Multi-Node Read/Write
 &lt;div id="multi-node-readwrite" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multi-node-readwrite" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Benchmarking with 12 instances on a 192 vCPU host, comparing RDMA (PolarDB-MP) vs CXL (PolarDB-MP with PolarCXLMem) performance:&lt;/p&gt;
&lt;p&gt;Point queries:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c5ac5a1f0d82.png" alt="image-20251124083738393" /&gt;&lt;/p&gt;
&lt;p&gt;Range queries:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/169bdabdbf3c.png" alt="image-20251125082404440" /&gt;&lt;/p&gt;
&lt;p&gt;Read-write:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/532e83b71906.png" alt="image-20251125082418710" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Point queries: Read amplification is most severe for point queries. CXL&amp;rsquo;s bandwidth consumption is 3-4x lower than RDMA. When reaching 3 nodes, RDMA bandwidth is already saturated — adding more nodes doesn&amp;rsquo;t improve bandwidth.&lt;/li&gt;
&lt;li&gt;Range queries: Read amplification is less severe. Only at &amp;gt;4 nodes does it reach the bandwidth ceiling of 11GB/s, while CXL can still scale linearly with nodes.&lt;/li&gt;
&lt;li&gt;Read-write: Performance is similar to range queries, just with smaller differences.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;PolarRecv Recovery Time
 &lt;div id="polarrecv-recovery-time" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#polarrecv-recovery-time" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;vanilla: Refers to the general approach, probably similar to PG reading from local cache or disk (possibly polar redo).&lt;/li&gt;
&lt;li&gt;RDMA-based: Refers to PolarDB-MP where some data can be read from disaggregated shared storage.&lt;/li&gt;
&lt;li&gt;PolarRecv: Refers to continuing to read most data from CXL, with only a small amount of partial pages needing recovery from redo files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6d0ae251efed.png" alt="image-20251125085711364" /&gt;&lt;/p&gt;
&lt;p&gt;The paper discusses recovery time in 2 phases: startup/recovery and reaching pre-crash load levels. Read-only doesn&amp;rsquo;t need recovery — as long as there&amp;rsquo;s data, you can start and take load. When writes exist, recovery is needed, and the advantage of continuing to read from CXL memory becomes apparent. The difference between 1-minute, 2-minute, and 4-minute recovery times is significant — it could be the difference between business being nearly imperceptible and noticeably impacted.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shared Data Updates
 &lt;div id="shared-data-updates" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-data-updates" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The focal point of distributed database performance combat is updates to shared data. After PolarDB-MP crushed Taurus-MM, PolarDB-CXL also crushed PolarDB-MP:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/30660f6b6fd5.png" alt="image-20251130164309249" /&gt;&lt;/p&gt;
&lt;p&gt;At 0% shared data, the RDMA-based solution just accesses local buffers, and PolarDB-CXL just treats CXL as a memory pool. Even so, CXL-based still performs better, mainly due to the read/write amplification and bandwidth ceiling issues of the RDMA-based solution mentioned earlier.&lt;/p&gt;
&lt;p&gt;From the performance comparison chart above, it&amp;rsquo;s clear that PolarDB-CXL significantly outperforms PolarDB-MP. The data is very clear. However, note that when shared data &amp;gt;60%, PolarDB-CXL&amp;rsquo;s performance improvement becomes less significant, mainly because:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Page-level locks become the bottleneck.&lt;/li&gt;
&lt;li&gt;As lock contention intensifies, processes enter sleep states, and frequent context switching further exacerbates resource contention.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PolarDB-CXL advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Eliminates RDMA&amp;rsquo;s &amp;ldquo;local-remote&amp;rdquo; hierarchical memory structure design.&lt;/li&gt;
&lt;li&gt;Resolves RDMA&amp;rsquo;s read/write amplification problem.&lt;/li&gt;
&lt;li&gt;Provides a CXL-based memory pool.&lt;/li&gt;
&lt;li&gt;PolarRecv, based on CXL persistent memory, enables faster database crash recovery.&lt;/li&gt;
&lt;li&gt;Benchmarking shows PolarDB-MP CXL outperforms PolarDB-MP RDMA.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PolarDB-CXL disadvantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cross-node updates still use page-level locks, which remain the main performance bottleneck in shared data update scenarios.&lt;/li&gt;
&lt;li&gt;The CXL 2.0 switch seems a bit dated — by the time the paper was published, switch devices supporting 3.2 were already available, and CXL 4.0 was announced in November 2025. We can predict future databases built on newer CXL standard switch devices.&lt;/li&gt;
&lt;li&gt;The paper quality isn&amp;rsquo;t actually as high as the MP paper — it mainly revolves around solutions for the CXL 2.0 switch physical hardware, which differs from the extensive database-layer design found in the PolarDB-MP paper.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2025/11/30/" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2025/11/30/&lt;/a&gt;论文精读polar-db-cxl2025-sigmod最佳工业论文/&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>Paper Deep Read: PolarDB-MP | 2024 SIGMOD Best Industrial Paper</title><link>https://lastdba.com/en/2025/11/30/paper-deep-read-polardb-mp-2024-sigmod-best-industrial-paper/</link><pubDate>Sun, 30 Nov 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/11/30/paper-deep-read-polardb-mp-2024-sigmod-best-industrial-paper/</guid><description>&lt;p&gt;Paper: PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared Memory&lt;/p&gt;
&lt;p&gt;SIGMOD best paper: &lt;a href="https://sigmod.org/sigmod-awards/sigmod-best-paper-award/" target="_blank" rel="noreferrer"&gt;https://sigmod.org/sigmod-awards/sigmod-best-paper-award/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Foreword and Abstract
 &lt;div id="foreword-and-abstract" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#foreword-and-abstract" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The paper opens with the problem: primary-replica architecture&amp;rsquo;s write throughput is limited by the primary. Shared-nothing architecture offers scalable multi-primary clusters that can solve the single-primary limitation, but this architecture suffers performance bottlenecks due to distributed transaction overhead. Recently, shared-storage-based cloud-native multi-primary databases have emerged, but under high-conflict scenarios, they face high conflict resolution costs and low data fusion efficiency.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Paper: PolarDB-MP: A Multi-Primary Cloud-Native Database via Disaggregated Shared Memory&lt;/p&gt;
&lt;p&gt;SIGMOD best paper: &lt;a href="https://sigmod.org/sigmod-awards/sigmod-best-paper-award/" target="_blank" rel="noreferrer"&gt;https://sigmod.org/sigmod-awards/sigmod-best-paper-award/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Foreword and Abstract
 &lt;div id="foreword-and-abstract" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#foreword-and-abstract" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The paper opens with the problem: primary-replica architecture&amp;rsquo;s write throughput is limited by the primary. Shared-nothing architecture offers scalable multi-primary clusters that can solve the single-primary limitation, but this architecture suffers performance bottlenecks due to distributed transaction overhead. Recently, shared-storage-based cloud-native multi-primary databases have emerged, but under high-conflict scenarios, they face high conflict resolution costs and low data fusion efficiency.&lt;/p&gt;
&lt;p&gt;So the problem is: single-primary primary-replica, shared-nothing, and shared-storage cloud-native multi-primary architectures all have their own issues.&lt;/p&gt;
&lt;p&gt;This paper proposes PolarDB-MP, a novel multi-primary cloud-native database combining disaggregated shared memory with shared storage. (Since multi-primary cloud-native databases already exist, it needs to be &amp;ldquo;novel.&amp;rdquo;)&lt;/p&gt;
&lt;p&gt;PolarDB-MP&amp;rsquo;s basic characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;All nodes can equally access all data, allowing transactions to be processed independently on a single node, &lt;strong&gt;without traditional distributed transaction mechanisms&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Shared storage: PolarStore and PolarFS, or other compatible shared storage solutions.&lt;/li&gt;
&lt;li&gt;Built on disaggregated shared memory.&lt;/li&gt;
&lt;li&gt;Low-latency communication via &lt;strong&gt;RDMA&lt;/strong&gt; (Remote Direct Memory Access).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LLSN&lt;/strong&gt; (Local Logical Sequence Number): Used to establish partial order for WAL logs generated by different nodes, combined with custom recovery strategies to ensure consistency and efficiency during abnormal recovery.&lt;/li&gt;
&lt;li&gt;Core component &lt;strong&gt;PMFS&lt;/strong&gt; (Polar Multi-Primary Fusion Server) responsible for:
&lt;ul&gt;
&lt;li&gt;Transaction Fusion — transaction ordering and visibility management&lt;/li&gt;
&lt;li&gt;Buffer Fusion — distributed shared buffer mechanism&lt;/li&gt;
&lt;li&gt;Lock Fusion — cross-node concurrency control&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Classification
 &lt;div id="classification" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#classification" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The classification is mainly to understand PolarDB-MP&amp;rsquo;s historical position and the &amp;ldquo;first&amp;rdquo; qualifier:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;PolarDB-MP is the first multi-primary cloud-native database that utilizes disaggregated shared memory and shared storage for transaction coordination and buffer fusion&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3532d59aa524.png" alt="image-20251109213814089" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Competitor Weaknesses
 &lt;div id="competitor-weaknesses" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#competitor-weaknesses" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Shared-nothing products: The paper doesn&amp;rsquo;t call out individual products, just one line: transactions accessing across multiple partitions require significant additional overhead for distributed transactions.&lt;/p&gt;
&lt;p&gt;Oracle:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Expensive distributed lock management&lt;/li&gt;
&lt;li&gt;Expensive network overhead&lt;/li&gt;
&lt;li&gt;Reliance on sophisticated hardware (alien tech)&lt;/li&gt;
&lt;li&gt;Difficult to migrate to cloud, or higher TCO (including maintenance and labor costs) compared to cloud-native databases after migration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AWS Aurora-MM:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Uses optimistic transaction model; high transaction abort rates under conflicts&lt;/li&gt;
&lt;li&gt;In some scenarios, 4-node throughput is lower than single-node&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Huawei Taurus-MM:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pessimistic transaction model. Relies on page storage and log replay to ensure cache consistency, with high overhead in concurrency control and data synchronization.&lt;/li&gt;
&lt;li&gt;Under 50% shared data read-write workload, 8 nodes only achieve 1.5x single-node performance improvement&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Oracle critique here is mainly plausible-sounding trash talk, while Aurora-MM and Taurus-MM have original vendor citations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Aurora-MM &amp;ldquo;in some scenarios, 4-node throughput is lower than single-node&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Taurus-MM &amp;ldquo;under 50% shared data read-write workload, 8 nodes only achieve 1.5x single-node performance improvement&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Transaction Fusion
 &lt;div id="transaction-fusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-fusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Transaction Fusion Overview
 &lt;div id="transaction-fusion-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-fusion-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;How does multi-primary ensure consistent data views?&lt;/p&gt;
&lt;p&gt;Snapshot isolation is a common MVCC implementation. A characteristic of snapshot isolation is that queries or transactions must maintain their consistent data view during execution. But in multi-primary architecture, local nodes cannot guarantee consistent data views due to remote data updates.&lt;/p&gt;
&lt;p&gt;To solve this, general multi-primary shared-storage architectures introduce global transaction mechanisms (Aurora-MM or Taurus-MM). PolarDB-MP introduces an innovative technique — transaction fusion within PMFS. &lt;strong&gt;Each node only maintains local transaction information, which can be accessed by other nodes via RDMA.&lt;/strong&gt; In contrast to global transactions, transaction fusion is decentralized.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Local Transactions and TIT Table
 &lt;div id="local-transactions-and-tit-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#local-transactions-and-tit-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each node in PolarDB-MP maintains a small amount of memory to store local transaction information (accessible by other nodes via RDMA). This local transaction information is stored in the transaction Information Table (TIT).&lt;/p&gt;
&lt;p&gt;TIT table contents:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transaction object pointer&lt;/li&gt;
&lt;li&gt;Commit timestamp (CTS) assigned by the global timestamp coordinator (TSO)&lt;/li&gt;
&lt;li&gt;version, representing different transactions in the same slot&lt;/li&gt;
&lt;li&gt;ref, indicating whether this transaction is being waited on by other transactions for lock release (probably PLock or RLock)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/90641c3618d1.png" alt="image-20251101131556184" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;How Transactions Proceed
 &lt;div id="how-transactions-proceed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-transactions-proceed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When a transaction begins, a local transaction id (presumably txid) is assigned, and the TIT slot stores the transaction object pointer, ref initialized to 0, and CTS initialized to &lt;code&gt;CSN_INIT&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;PolarDB-MP uses a global transaction ID to identify a transaction: global transaction ID = (node_id, trx_id, slot_id, version). The global transaction ID does not include CTS. To know the commit order of transactions, such as when constructing a transaction visibility view, you need to go through the global transaction ID, via RDMA, to the target node to find CTS (similar to PG&amp;rsquo;s &lt;code&gt;pg_xact_commit_timestamp()&lt;/code&gt; function, which finds the corresponding transaction commit time from local files using the transaction id).&lt;/p&gt;
&lt;p&gt;If trx_id is the transaction ID in PG, then node_id + trx_id can identify the global uniqueness of a transaction, or node_id + slot_id + version could also work to some extent (when slot id is not reused, e.g., at a given moment it uniquely identifies a transaction). Of course, the extra information combined is also unique. After all, this information is key to PolarDB-MP&amp;rsquo;s transaction fusion implementation.&lt;/p&gt;
&lt;p&gt;Each transaction constructs a visibility view using the global transaction ID and CTS. The visibility view concept is consistent with PG: the current read view can read data rows committed before the read view, and the latest version rows.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Accessing Remote CTS
 &lt;div id="accessing-remote-cts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#accessing-remote-cts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since CTS is local (in TIT or on the local filesystem), obtaining the reading transaction&amp;rsquo;s CTS is an interesting task:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/41db536b2114.png" alt="image-20251101153437311" /&gt;&lt;/p&gt;
&lt;p&gt;1.1 If a row&amp;rsquo;s CTS is CSN_INIT/CTS_INIT, meaning the transaction is still active, return the maximum CTS to indicate it&amp;rsquo;s invisible to all transactions except itself.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;If a row&amp;rsquo;s CTS is not CSN_INIT/CTS_INIT, meaning the transaction has committed, and it&amp;rsquo;s in the local TIT, directly return CTS.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If a row has no CTS, obtain CTS via the row&amp;rsquo;s g_trx_id.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;2.1 If the transaction belongs to the local node (g_trx_id has node id), read from local filesystem to local TIT.&lt;/p&gt;
&lt;p&gt;2.2 If the transaction doesn&amp;rsquo;t belong to the local node, read from remote filesystem to remote TIT via RDMA.&lt;/p&gt;
&lt;p&gt;3.1 If slot.version != g_trx_id.version, the transaction must have committed, so the row is definitely visible to all transactions. Return minimum CTS to indicate visibility to all transactions.&lt;/p&gt;
&lt;p&gt;3.2 If slot.version = g_trx_id.version, refer to 1.1, 1.2.&lt;/p&gt;
&lt;p&gt;PolarDB-MP&amp;rsquo;s transaction visibility concept is very similar to PG&amp;rsquo;s, except PG uses txid instead of CTS to indicate transaction ordering and doesn&amp;rsquo;t need to consider remote access.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Row Update Transactions
 &lt;div id="row-update-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#row-update-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Additionally, row updates are also very similar:&lt;/p&gt;
&lt;p&gt;When PolarDB-MP updates a row, besides updating the data itself, it must also:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Update the row&amp;rsquo;s global transaction ID (g_trx_id) (if it&amp;rsquo;s an in-row update, then it modifies PG&amp;rsquo;s row header).&lt;/li&gt;
&lt;li&gt;Update the row&amp;rsquo;s CTS. (The paper doesn&amp;rsquo;t specify whether this is in the row header or filesystem. If similar to PG, it should be in the &lt;code&gt;commit_ts&lt;/code&gt; directory on the filesystem. Polar not confirmed.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Questions About Transaction Fusion (Things I Didn&amp;rsquo;t Understand)
 &lt;div id="questions-about-transaction-fusion-things-i-didnt-understand" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#questions-about-transaction-fusion-things-i-didnt-understand" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;g_trx_id is row metadata written to disk. If nodes are added or removed, does the node_id in the data row&amp;rsquo;s g_trx_id need updating? If not, which node should the row be loaded into when read next time?&lt;/p&gt;
&lt;p&gt;A new row&amp;rsquo;s CTS is stored on local node A. If another node B updates this row, is the new CTS on node A or B?&lt;/p&gt;
&lt;p&gt;&amp;ldquo;assigned a read view, which consists of its own g_trx_id and the current CTS.&amp;rdquo; Do read-only transactions also get assigned a g_trx_id when constructing a read view?&lt;/p&gt;
&lt;p&gt;Without a doubt, a parameter like &lt;code&gt;track_commit_timestamp&lt;/code&gt; must be forcibly enabled.&lt;/p&gt;
&lt;p&gt;If there are many writes on node A and reads on node B, B&amp;rsquo;s reads will access A&amp;rsquo;s TIT data via RDMA — does this generate significant network IO? Should this be considered when designing read-write separation or multi-node reads and writes? The original paper might answer this — &amp;ldquo;Multi-primary architectures inherently require synchronizing large amounts of data and messages between nodes to support concurrent access across multiple nodes. As network technology develops (InfiniBand, RDMA) and achieves commercial deployment, the network bottleneck becomes less significant.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Global timestamps could become a bottleneck in distributed systems. PolarDB-SCC is a shared-storage-based timestamp solution that appears to perform well. Due to time constraints, I&amp;rsquo;ll set this aside for now.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Buffer Fusion
 &lt;div id="buffer-fusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buffer-fusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Buffer Fusion Introduction
 &lt;div id="buffer-fusion-introduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buffer-fusion-introduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each node in PolarDB-MP can update any data page, leading to substantial data transfer. Buffer Fusion&amp;rsquo;s distributed buffer pool (DBP) is designed to solve this problem. Each node has a local buffer pool (LBP), which is a subset of DBP.&lt;/p&gt;

&lt;h3 class="relative group"&gt;How Buffer Fusion Works
 &lt;div id="how-buffer-fusion-works" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-buffer-fusion-works" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;LBP has two new metadata items for pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;valid: whether the page has been updated by another node&lt;/li&gt;
&lt;li&gt;r_addr: pointer to the page in DBP&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/652bf5e74943.png" alt="image-20251102105723909" /&gt;&lt;/p&gt;
&lt;p&gt;When accessing a page from LBP, the current node must first check if the page is valid. If invalid, it must access DBP via r_addr. After DBP stores a new version of the page, buffer fusion invalidates all remote pages. In LBP, dirty pages are periodically flushed to DBP in the background or after releasing the PLock lock.&lt;/p&gt;
&lt;p&gt;Page access steps:&lt;/p&gt;
&lt;p&gt;1.1 If the page is in LBP and valid, access directly.
1.2 If the page is in LBP and invalid, access DBP via RDMA.
2. If the page is in neither LBP nor DBP, read from shared storage.
3. The page is loaded from a node into LBP and registered in DBP.&lt;/p&gt;
&lt;p&gt;PolarDB&amp;rsquo;s buffer fusion key component is disaggregated shared memory. It appears to be a/group of physical hardware or an integrated component built on top of it, separate from compute nodes. This differs significantly from memory in traditional distributed systems.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s also different from transaction fusion: transaction fusion requires accessing remote nodes with the same architecture, while buffer fusion doesn&amp;rsquo;t require accessing remote nodes with the same architecture — it separately accesses the disaggregated shared storage component.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Questions About Buffer Fusion (Things I Didn&amp;rsquo;t Understand)
 &lt;div id="questions-about-buffer-fusion-things-i-didnt-understand" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#questions-about-buffer-fusion-things-i-didnt-understand" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Disaggregated shared memory seems like a component separate from standard hosts — so what exactly is it?&lt;/p&gt;

&lt;h2 class="relative group"&gt;Lock Fusion
 &lt;div id="lock-fusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lock-fusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Lock Types in Lock Fusion
 &lt;div id="lock-types-in-lock-fusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lock-types-in-lock-fusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Buffer fusion solves how nodes access remote data; lock fusion solves concurrent access control.&lt;/p&gt;
&lt;p&gt;Buffer fusion has two types of locks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;page-locking (PLock): Similar to latches, controlling atomic access and internal structure consistency. Single-node page access doesn&amp;rsquo;t use PLock.&lt;/li&gt;
&lt;li&gt;row-locking (RLock): Responsible for cross-node transaction control, following the two-phase lock protocol.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;PLock Access Flow
 &lt;div id="plock-access-flow" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#plock-access-flow" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;(The paper doesn&amp;rsquo;t say where lock fusion occurs. Since PLock is a page-level latch and page fusion happens on shared memory, I&amp;rsquo;ll assume lock fusion also occurs on shared memory, as this is easier to understand.)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Before updating/reading a page, the &lt;em&gt;local lock manager&lt;/em&gt; checks whether the local node already holds the corresponding X/S PLock (or higher-level lock).
1.1 If yes, execute in place.
1.2 If no, acquire PLock through Lock Fusion.&lt;/li&gt;
&lt;li&gt;Lock fusion checks for conflicts before responding; if a conflict exists, the request waits.&lt;/li&gt;
&lt;li&gt;When PLock is released by a node, it notifies Lock Fusion, which updates PLock&amp;rsquo;s state and notifies other nodes to continue their operations.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/92aa2082d68f.png" alt="image-20251102142359091" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;PLock Lazy Releasing
 &lt;div id="plock-lazy-releasing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#plock-lazy-releasing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;According to the PLock access flow above, a PLock is immediately released after local operations complete. This may not be optimal — according to temporal locality: &amp;ldquo;a data item or instruction accessed at a given time is likely to be accessed again in the near future.&amp;rdquo; Lazy releasing minimizes PLock lock RPC access load.&lt;/p&gt;
&lt;p&gt;The principle is simple: PLock is not immediately released after use on the local node; it&amp;rsquo;s only released when ref reaches 0.&lt;/p&gt;
&lt;p&gt;When other nodes need PLock, Lock Fusion also sends negotiation messages to intervene when the local node is holding the lock; the local node must communicate with Lock Fusion rather than autonomously handling PLock. Lock Fusion uses a &amp;ldquo;first-in-first-out&amp;rdquo; strategy to resolve cross-node lock ownership, again until the local node&amp;rsquo;s ref = 0, at which point other nodes can acquire the lock.&lt;/p&gt;
&lt;p&gt;Lazy releasing is an effective distributed lock solution, balancing local lock optimization with global lock allocation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;RLock Overview
 &lt;div id="rlock-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rlock-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;RLock uses the global transaction ID for determination (similar to PG). According to the transaction fusion content, the global transaction ID contains node id, transaction id, slot id, version. So when a local node reads a row, it can directly obtain the lock information on the row, know where the lock is (node id), and know if the lock is active.&lt;/p&gt;
&lt;p&gt;There are two interesting points about determining transaction activity:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;From the transaction fusion flow of accessing remote CTS: if the transaction&amp;rsquo;s CTS is a valid value, or the transaction is in the same slot in TIT but not the same version, the transaction has definitely committed, so no need to check activity. If the source transaction is not active, there&amp;rsquo;s no need to wait for locks — proceed directly.&lt;/li&gt;
&lt;li&gt;PG has the concept of a minimum active transaction ID, which also exists in PolarDB-MP. If the transaction ID on the row is less than the global minimum active transaction ID, the source transaction must have also committed (or rolled back).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;How RLock Works
 &lt;div id="how-rlock-works" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-rlock-works" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Local rows are handled locally; only conflicts are processed in Lock Fusion; cross-node row locks require RLock. &amp;ldquo;The transaction ID in the row functions as a lock indicator. So this protocol only supports exclusive (X) lock. The shared (S) lock on a row is not supported in PolarDB-MP, but it&amp;rsquo;s acceptable.&amp;rdquo; Only truly conflicting exclusive locks need RLock; shared locks don&amp;rsquo;t need RLock.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a84d705b09b8.png" alt="image-20251102155613001" /&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;T30 reads the row from shared storage and can determine from the row&amp;rsquo;s metadata (g_trx_id) that the transaction is active and which node it&amp;rsquo;s on.&lt;/li&gt;
&lt;li&gt;T30 remotely adjusts T10&amp;rsquo;s transaction ref.&lt;/li&gt;
&lt;li&gt;T30 sends a wait status to the Lock Fusion service.&lt;/li&gt;
&lt;li&gt;Lock Fusion adds wait information to the wait info table.&lt;/li&gt;
&lt;li&gt;T10 finishes execution and notifies Lock Fusion.&lt;/li&gt;
&lt;li&gt;Lock Fusion checks the wait info table, then notifies T30 it can continue.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Questions About Lock Fusion (Things I Didn&amp;rsquo;t Understand)
 &lt;div id="questions-about-lock-fusion-things-i-didnt-understand" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#questions-about-lock-fusion-things-i-didnt-understand" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&amp;ldquo;when attempting to update a row, it must already hold an X PLock lock on the page containing the row&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Updating also requires holding an exclusive PLock on the page, meaning updates on the same page block each other — doesn&amp;rsquo;t this affect concurrency? Locally, there shouldn&amp;rsquo;t be such behavior; PG doesn&amp;rsquo;t have page-exclusive locks for update scenarios.&lt;/p&gt;
&lt;p&gt;In the &amp;ldquo;Logs ordering and recovery&amp;rdquo; chapter, there are two statements: &amp;ldquo;Thanks to the PLock design, only one transaction can update a page at a time&amp;rdquo; and &amp;ldquo;When a page is updated across two nodes, one node pushes its updated page to the DBP before releasing the PLock, allowing the next node to retrieve it from the DBP.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Yes, during &lt;strong&gt;cross-node&lt;/strong&gt; data updates, there are page-level exclusive locks.&lt;/p&gt;

&lt;h2 class="relative group"&gt;PMFS Summary (Hot Take)
 &lt;div id="pmfs-summary-hot-take" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pmfs-summary-hot-take" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PMFS (Polar Multi-Primary Fusion Server) is the core component implementing PolarDB-MP&amp;rsquo;s multi-primary distributed system. Among its features, the &lt;strong&gt;global transaction ID&lt;/strong&gt; design is ingenious — it transforms PG&amp;rsquo;s transaction ID into one containing node information, transaction id, and transaction fusion&amp;rsquo;s slot and version information, placed in the row header. This has several benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Directly accessing a row reveals the row&amp;rsquo;s version ordering.&lt;/li&gt;
&lt;li&gt;Directly accessing a row reveals which node updated it.&lt;/li&gt;
&lt;li&gt;Directly accessing a row reveals whether cross-node locks may exist.&lt;/li&gt;
&lt;li&gt;Uses minimum active transactions to reduce conflict determination.&lt;/li&gt;
&lt;li&gt;Uses global transaction ID information to achieve distributed retrieval of transaction commit timestamps (CTS).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Buffer fusion and lock fusion in PMFS appear highly dependent on the shared memory component.&lt;/li&gt;
&lt;li&gt;RDMA is omnipresent throughout.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Log Ordering
 &lt;div id="log-ordering" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#log-ordering" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Partial Order
 &lt;div id="partial-order" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partial-order" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;First, WAL is generated on each node without any concurrency control mechanism — each writes independently to shared storage. Each node&amp;rsquo;s LSN is sequential for that node, but across multiple nodes, WAL records don&amp;rsquo;t exhibit global ordering.&lt;/p&gt;
&lt;p&gt;But is global ordering needed when writing WAL records?&lt;/p&gt;
&lt;p&gt;From the paper, most of the time it&amp;rsquo;s not needed.&lt;/p&gt;
&lt;p&gt;Only one case requires guaranteed global ordering during writing: cross-node updates to the same page.&lt;/p&gt;
&lt;p&gt;However, according to the PMFS lock fusion mechanism, cross-node updates to the same page are exclusive. Lock fusion can ensure the ordering of cross-node page updates.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Recovery Ordering
 &lt;div id="recovery-ordering" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#recovery-ordering" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since LLSNs from cross-node writes come from multiple nodes and are likely not in order, recovery needs to be done in order. Reading all WAL records and sorting by LLSN is a simple approach, but massive sorting is very resource-intensive.&lt;/p&gt;
&lt;p&gt;PolarDB-MP proposes segment-wise sorting of LLSN — each segment is called a chunk, with chunk boundaries called LLSN bounds. PolarDB-MP can guarantee that an LLSN bound is always less than the next bound, then sort LLSNs within each chunk.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Questions About Log Ordering (Things I Didn&amp;rsquo;t Understand)
 &lt;div id="questions-about-log-ordering-things-i-didnt-understand" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#questions-about-log-ordering-things-i-didnt-understand" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&amp;ldquo;utilizing redo (write-ahead) logs for data recovery and undo logs for rolling back uncommitted changes&amp;rdquo;&lt;/p&gt;
&lt;p&gt;PolarDB-MP has undo log files? What is this undo for?&lt;/p&gt;
&lt;p&gt;I didn&amp;rsquo;t see anything particularly special about LLSN; the paper doesn&amp;rsquo;t detail its structure. LSN seems sufficient — maybe there are differences regarding global transaction IDs.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Evaluation
 &lt;div id="evaluation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#evaluation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Read-only operations are all local, so adding nodes linearly increases throughput. If read-write/write-only data is well-partitioned and doesn&amp;rsquo;t cross nodes, it&amp;rsquo;s also nearly linear.&lt;/p&gt;
&lt;p&gt;The problem lies in shared data across read-write/write-only nodes, which is the ultimate test of distributed database performance.&lt;/p&gt;
&lt;p&gt;The paper directly compares against Huawei&amp;rsquo;s Taurus-MM. The conclusion: PolarDB-MP&amp;rsquo;s cross-node write performance is indeed significantly better.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/eec82fe8cb39.png" alt="image-20251109212445723" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Nitpicking
 &lt;div id="nitpicking" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#nitpicking" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The paper mentions Taurus-MM&amp;rsquo;s performance improvement under 8-node shared data in two places, but the data is inconsistent:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The eight-node cluster only improves the throughput by 1.8× compared to the single-node version in the read-write workload with 50% shared data.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;the throughput of Taurus-MM&amp;rsquo;s eight-node cluster is approximately 1.8× that of a single node under the SysBench write-only workload with 30% shared data, illustrating the trade-offs and challenges in optimizing multi-primary cloud databases&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Sometimes 30% shared data, sometimes 50% — not very rigorous. The original &lt;a href="https://www.vldb.org/pvldb/vol16/p3488-depoutovitch.pdf" target="_blank" rel="noreferrer"&gt;Taurus MM paper&lt;/a&gt; says 50%:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/34c0ec5e940b.png" alt="image-20251025162117902" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Not much to summarize — see the &lt;em&gt;Foreword and Abstract&lt;/em&gt; and &lt;em&gt;PMFS Summary&lt;/em&gt; sections.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2025/11/30/" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2025/11/30/&lt;/a&gt;论文精读polar-db-mp2024-sigmod最佳工业论文/&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>Case: From Inaccurate DISTINCT to the Principles of DISTINCT Estimation</title><link>https://lastdba.com/en/2025/10/19/case-from-inaccurate-distinct-to-the-principles-of-distinct-estimation/</link><pubDate>Sun, 19 Oct 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/10/19/case-from-inaccurate-distinct-to-the-principles-of-distinct-estimation/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;n_distinct&lt;/code&gt; statistic was severely inaccurate.&lt;/p&gt;
&lt;p&gt;This problem appeared across multiple databases. For example:&lt;/p&gt;
&lt;p&gt;A table with 200 million rows and a true DISTINCT count of 8 million had a statistics DISTINCT value of only 40,000.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sampling Model
 &lt;div id="sampling-model" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sampling-model" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7e0b33a60cf4.png" alt="Does the standby have its own statistics? · PostgreSQL Apprentice" /&gt;&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;n_distinct&lt;/code&gt; statistic was severely inaccurate.&lt;/p&gt;
&lt;p&gt;This problem appeared across multiple databases. For example:&lt;/p&gt;
&lt;p&gt;A table with 200 million rows and a true DISTINCT count of 8 million had a statistics DISTINCT value of only 40,000.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sampling Model
 &lt;div id="sampling-model" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sampling-model" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7e0b33a60cf4.png" alt="Does the standby have its own statistics? · PostgreSQL Apprentice" /&gt;&lt;/p&gt;
&lt;p&gt;The default &lt;code&gt;default_statistics_target=100&lt;/code&gt; means 30,000 rows are sampled from 30,000 pages.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt; tablzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: analyzing &lt;span style="color:#e6db74"&gt;&amp;#34;public.tablzl1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: do_analyze_rel, &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;332&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: &lt;span style="color:#e6db74"&gt;&amp;#34;tablzl1&amp;#34;&lt;/span&gt;: scanned &lt;span style="color:#ae81ff"&gt;30000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22963751&lt;/span&gt; pages, containing &lt;span style="color:#ae81ff"&gt;1061942&lt;/span&gt; live &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3953&lt;/span&gt; dead &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;30000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; sample, &lt;span style="color:#ae81ff"&gt;812872389&lt;/span&gt; estimated total &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: acquire_sample_rows, &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1340&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &amp;ldquo;scanned 30000&amp;rdquo; and &amp;ldquo;30000 rows in sample&amp;rdquo;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;DISTINCT Estimation Algorithm
 &lt;div id="distinct-estimation-algorithm" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#distinct-estimation-algorithm" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The DISTINCT estimation algorithm in &lt;code&gt;analyze.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Estimate the number of distinct values using the estimator
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * proposed by Haas and Stokes in IBM Research Report RJ 10025:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *		n*d / (n - f1 + f1*n/N)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * where f1 is the number of distinct values that occurred
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * exactly once in our sample of n rows (from a total of N),
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * and d is the total number of distinct values in the sample.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * This is their Duj1 estimator; the other estimators they
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * recommend are considerably more complex, and are numerically
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * very unstable when n is much smaller than N.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * In this calculation, we consider only non-nulls. We used to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * include rows with null values in the n and N counts, but that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * leads to inaccurate answers in columns with many nulls, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * it&amp;#39;s intuitively bogus anyway considering the desired result is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * the number of distinct non-null values.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * We assume (not very reliably!) that all the multiply-occurring
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * values are reflected in the final track[] list, and the other
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * nonnull values all appeared but once. (XXX this usually
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * results in a drastic overestimate of ndistinct. Can we do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * any better?)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 *----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			f1 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; nonnull_cnt &lt;span style="color:#f92672"&gt;-&lt;/span&gt; summultiple;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			d &lt;span style="color:#f92672"&gt;=&lt;/span&gt; f1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt; nmultiple;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		n &lt;span style="color:#f92672"&gt;=&lt;/span&gt; samplerows &lt;span style="color:#f92672"&gt;-&lt;/span&gt; null_cnt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		N &lt;span style="color:#f92672"&gt;=&lt;/span&gt; totalrows &lt;span style="color:#f92672"&gt;*&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1.0&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; stats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;stanullfrac);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		stadistinct;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - f1 + f1*n/N)&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;n&lt;/code&gt; = number of sample rows (rows scanned)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;d&lt;/code&gt; = number of distinct values found in the sample&lt;/li&gt;
&lt;li&gt;&lt;code&gt;f1&lt;/code&gt; = number of values appearing exactly once in the sample&lt;/li&gt;
&lt;li&gt;&lt;code&gt;N&lt;/code&gt; = total number of rows in the table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Algorithm paper: &lt;a href="https://hugepdf.com/download/download-extended-version-of-this-paper_pdf" target="_blank" rel="noreferrer"&gt;https://hugepdf.com/download/download-extended-version-of-this-paper_pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The paper is rather dense, so let&amp;rsquo;s work through some assumptions to understand this DISTINCT algorithm:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Assume all values appear exactly once, and the table is large (n &amp;laquo; N), so f1 = d, n/N ≈ 0&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;d*d / (d - d + d*0) = d²/0&lt;/code&gt; — this would evaluate to -1.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Assume all values appear exactly once, and the table is small (n = N), so f1 = d, n/N = 1&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - d + d*1) = d&lt;/code&gt; — the sampled distinct count, which equals the number of sampled rows.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Assume no values appear exactly once in the sample, i.e., f1 = 0&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - f1 + f1*n/N) = n*d / n = d&lt;/code&gt; — just the distinct count in the sample.&lt;/p&gt;
&lt;p&gt;If a column is populated by inserting several rows of the same value, then several rows of another value, like:&lt;/p&gt;
&lt;p&gt;11, 2, 2, 2, 2, 3, 3, 3, &amp;hellip;&lt;/p&gt;
&lt;p&gt;3.1 Small table, all 30,000 rows sampled, true distinct = 10,000 (assumed): estimated distinct = d = 10,000&lt;/p&gt;
&lt;p&gt;3.2 Large table, sample contains both repeating values and singletons (some repeating values only have one row captured), i.e., n = 30,000, n/N ≈ 0&lt;/p&gt;
&lt;p&gt;&lt;code&gt;n*d / (n - f1 + f1*n/N) = n*d / (n - f1) = 30000*d/(30000-f1)&lt;/code&gt; — the larger the distinct count in the sample, the larger the estimated distinct; the larger the number of singletons, the larger the estimated distinct.&lt;/p&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DISTINCT estimation is directly related to the distinct count and singleton count in the sample&lt;/li&gt;
&lt;li&gt;If the singleton count = 0, then larger samples yield larger estimated distinct values&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Verification
 &lt;div id="verification" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#verification" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since the default maximum sample size is 30,000 rows, for tables larger than this, the estimator is likely to underestimate DISTINCT. Note: the data should not have too many unique values.&lt;/p&gt;
&lt;p&gt;Testing a table with different sample sizes:&lt;/p&gt;
&lt;p&gt;Table: reltuples = 800 million, relpages = 20 million, size = 175GB, true column distinct = 100 million&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;target statistics&lt;/th&gt;
 &lt;th&gt;pages sampling ratio (approx)&lt;/th&gt;
 &lt;th&gt;tuples sampling ratio (approx)&lt;/th&gt;
 &lt;th&gt;n_distinct&lt;/th&gt;
 &lt;th&gt;execution time&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;50&lt;/td&gt;
 &lt;td&gt;0.00075&lt;/td&gt;
 &lt;td&gt;0.00001875&lt;/td&gt;
 &lt;td&gt;60k&lt;/td&gt;
 &lt;td&gt;2s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;100&lt;/td&gt;
 &lt;td&gt;0.0015&lt;/td&gt;
 &lt;td&gt;0.0000375&lt;/td&gt;
 &lt;td&gt;110k&lt;/td&gt;
 &lt;td&gt;5s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;1000&lt;/td&gt;
 &lt;td&gt;0.015&lt;/td&gt;
 &lt;td&gt;0.000375&lt;/td&gt;
 &lt;td&gt;1.03M&lt;/td&gt;
 &lt;td&gt;58s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3000&lt;/td&gt;
 &lt;td&gt;0.045&lt;/td&gt;
 &lt;td&gt;0.001125&lt;/td&gt;
 &lt;td&gt;2.68M&lt;/td&gt;
 &lt;td&gt;3min 1s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10000&lt;/td&gt;
 &lt;td&gt;0.15&lt;/td&gt;
 &lt;td&gt;0.00375&lt;/td&gt;
 &lt;td&gt;6.75M&lt;/td&gt;
 &lt;td&gt;7min 21s&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(maximum target statistics is 10000)&lt;/p&gt;
&lt;p&gt;A rough conclusion: n_distinct and ANALYZE execution time grow proportionally with the sample size.&lt;/p&gt;
&lt;p&gt;n_distinct grows with sample size, while pages and tuples estimates remain consistently accurate.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Solution
 &lt;div id="solution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For extremely large tables, consider partitioning or optimizing based on actual SQL patterns.&lt;/p&gt;
&lt;p&gt;You can also adjust the statistics target. The default &lt;code&gt;default_statistics_target=100&lt;/code&gt; means 30,000 rows from 30,000 pages.&lt;/p&gt;
&lt;p&gt;Temporary fix:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; default_statistics_target&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; tab1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Long-term fix:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;STATISTICS&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Column-level statistics target has the highest priority, overriding &lt;code&gt;default_statistics_target&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Maximum statistics target is 10000&lt;/li&gt;
&lt;li&gt;The table&amp;rsquo;s sampling target is determined by the maximum column target:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Determine how many rows we need to sample, using the worst case from
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * all analyzable columns. We use a lower bound of 100 rows to avoid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * possible overflow in Vitter&amp;#39;s algorithm. (Note: that will also be the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * target in the corner case where there are no analyzable columns.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	targrows &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; attr_cnt; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (targrows &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			targrows &lt;span style="color:#f92672"&gt;=&lt;/span&gt; vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (ind &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; ind &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; nindexes; ind&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		AnlIndexData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;thisdata &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;indexdata[ind];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; thisdata&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;attr_cnt; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (targrows &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; thisdata&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				targrows &lt;span style="color:#f92672"&gt;=&lt;/span&gt; thisdata&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;vacattrstats[i]&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;minrows;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If ANALYZE collects more or fewer rows than expected, check &lt;code&gt;pg_statistic&lt;/code&gt; for per-column &lt;code&gt;stattarget&lt;/code&gt; settings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; attrelid::regclass,attname,attstattarget &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; attrelid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;tab1&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; attstattarget &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;For large tables where columns are non-unique but have high distinct counts (a realistic scenario), the sampling algorithm underestimates the DISTINCT value, and this is positively correlated with the sampling ratio. The default sampling ratio is too small for large tables. You can increase it, but even the maximum is not that large.&lt;/p&gt;</content:encoded></item><item><title>Case Study: Performance Degradation After Adding an Index and the Generic Plan</title><link>https://lastdba.com/en/2025/09/13/case-study-performance-degradation-after-adding-an-index-and-the-generic-plan/</link><pubDate>Sat, 13 Sep 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/09/13/case-study-performance-degradation-after-adding-an-index-and-the-generic-plan/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;An index was added the night before, and the next morning the CPU was maxed out. The problematic SQL was easy to locate — just one query. The SQL was running for over 30 seconds, but the day before it only took about 3 seconds, so we needed to examine the before-and-after execution plan changes.&lt;/p&gt;
&lt;p&gt;Only the key parts of the execution plan are shown below.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;An index was added the night before, and the next morning the CPU was maxed out. The problematic SQL was easy to locate — just one query. The SQL was running for over 30 seconds, but the day before it only took about 3 seconds, so we needed to examine the before-and-after execution plan changes.&lt;/p&gt;
&lt;p&gt;Only the key parts of the execution plan are shown below.&lt;/p&gt;
&lt;p&gt;Execution plan before adding the index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;92&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2259694&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;265822&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; uk_lzl_task &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_task t (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20007&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;195&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_by)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11337&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14842&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3053&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1467&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202501_task_no_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1594&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202502 cc_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;67&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3066&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1604&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202502_task_no_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1605&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202503_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202503 cc_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1362&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;61&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1637&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202504_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202504 cc_4 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;604&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1795&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202505_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202505 cc_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;445&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202506_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202506 cc_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;583&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1675&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202507_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202507 cc_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;633&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1973&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202508_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202508 cc_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;619&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1720&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_202509_task_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_202509 cc_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;893&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1521&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-07 09:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-03 12:56:44.973&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The created_date time range searches for data within 1 year. The index added the night before was on created_date.&lt;/p&gt;
&lt;p&gt;Execution plan after adding the index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23740&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;82&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;191&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: ((cc.task_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;23376&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;114435&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Subplans Removed: &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202501_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202502_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202502 cc_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1822&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7405&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202503_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202503 cc_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1430&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7917&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202504_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202504 cc_4 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2412&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11041&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202505_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202505 cc_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2260&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13381&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202506_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202506 cc_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3930&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17832&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202507_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202507 cc_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;77&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21786&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202508_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202508 cc_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4736&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;22033&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202509_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202509 cc_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;627&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1893&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; ai_outbound_call_task t (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((created_by)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; idx_ai_call_task_c (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_by)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The new execution plan switched from using the task_no index to using the created_date index, and changed from a Nested Loop to a Hash Join. The cost dropped from 2,259,694 to 23,740 — a 100x reduction. However, the actual execution time increased by roughly 10x.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Diagnosis
 &lt;div id="problem-diagnosis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-diagnosis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s work through three questions to analyze and diagnose the issue:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did the optimizer suggest the created_date index?&lt;/li&gt;
&lt;li&gt;Why did it end up using the new index?&lt;/li&gt;
&lt;li&gt;Why is the estimated row count very small even though the actual execution time is very long?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Why Did the Optimizer Suggest the created_date Index?
 &lt;div id="why-did-the-optimizer-suggest-the-created_date-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-did-the-optimizer-suggest-the-created_date-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If we directly substitute the parameters from the PostgreSQL log into the SQL text, the execution plan is actually the good one — the one that runs in 3 seconds using the task_no index. The optimization engineer also ran it this way and found it to be fine. But in production, this wasn&amp;rsquo;t the execution plan that was used.&lt;/p&gt;
&lt;p&gt;Even when we force PostgreSQL &lt;em&gt;not&lt;/em&gt; to use the task_no index, the optimizer chooses a sequential scan rather than the created_date index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (((cc.task_no)::text &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t.task_no)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2794425&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;22238757&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;193060&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1585238&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202502 cc_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;178567&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1480969&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202503 cc_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;191073&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1583356&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is very strange: no matter how we ran it ourselves, we couldn&amp;rsquo;t get it to use the bad created_date index. So how did production end up using it?&lt;/p&gt;
&lt;p&gt;The answer lies in bind variables — it was likely a &lt;strong&gt;generic plan&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Characteristics of the generic plan:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When &lt;code&gt;plan_cache_mode = auto&lt;/code&gt;, PostgreSQL compares the generic plan cost against the average cost of the first five hard parses (custom plans). If the generic plan has a lower cost, it is used and subsequent executions skip hard parsing; otherwise, every execution undergoes hard parsing (see the source function &lt;code&gt;choose_custom_plan&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;What the generic plan looks like has nothing to do with the actual bind variable values.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This is easy to reproduce using bind variables via PREPARE/EXECUTE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; sql1(&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,text) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COUNT&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xxxxxxx...;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;367&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;220&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;254&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;386&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;235&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;343&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;234&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;110&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;233&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;570&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;2025-01-08 11:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2025-09-04 08:31:43&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;LIUZHILONG62&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;70678&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;344&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;678&lt;/span&gt;) &lt;span style="color:#75715e"&gt;-- 6th execution is significantly slower
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx &lt;span style="color:#75715e"&gt;-- pg14 supports pg_prepared_statements
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;generic_plans &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;custom_plans &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The first 5 hard parses (custom plans) all executed quickly. The 6th execution used the generic plan, which used the created_date index — this was the exact production failure plan, which was extremely slow.&lt;/p&gt;
&lt;p&gt;So while the optimization suggestion to use the created_date index was somewhat problematic, when you substituted bind variables with actual values and ran EXPLAIN, the execution plan was correct. In production, however, the application used bind variables, and the generic plan kicked in — causing the failure.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Is the Estimated Row Count Small But the Actual Execution Time Very Long?
 &lt;div id="why-is-the-estimated-row-count-small-but-the-actual-execution-time-very-long" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-is-the-estimated-row-count-small-but-the-actual-execution-time-very-long" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The failing execution plan has a problem: the estimated cost is too small, and the estimated rows are too few.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_lzltab_202501_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((created_date &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (created_date &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From a business logic perspective, this looks abnormal. The created_date condition spans multiple partitions, and since created_date is the partition key, &lt;code&gt;WHERE created_date &amp;gt;= xx AND &amp;lt;= yy&lt;/code&gt; must be contiguous. The selectivity on a sub-partition should always be 1, meaning rows should equal the sub-partition row count — several million, not several thousand.&lt;/p&gt;
&lt;p&gt;At first I thought it was a statistics issue, but the statistics were fairly accurate — the historical partition data for 202501 hadn&amp;rsquo;t changed.&lt;/p&gt;
&lt;p&gt;Since this is a generic plan issue, we need to examine the generic plan cost estimation by reading the source code. Cost estimation is more complex, but rows estimation is relatively easier to understand and locate.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;calc_rangesel&lt;/span&gt;(TypeCacheEntry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;typcache, VariableStatData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;vardata,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; RangeType &lt;span style="color:#f92672"&gt;*&lt;/span&gt;constval, Oid operator)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* with any other operator, empty Op non-empty matches nothing */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			selec &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1.0&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; empty_frac) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; hist_selec;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* all range operators are strict */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	selec &lt;span style="color:#f92672"&gt;*=&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1.0&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; null_frac);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;range_select = (1 - null_frac) * histogram_selectivity&lt;/code&gt;. The range histogram selectivity looks at the histogram buckets hit by the range plus any matching MCV entries. However, we don&amp;rsquo;t need to compute all this for this case.&lt;/p&gt;
&lt;p&gt;Because the generic plan does not look at the histogram:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * rangesel -- restriction selectivity for range operators
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Datum
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;rangesel&lt;/span&gt;(PG_FUNCTION_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we got a valid constant on one side of the operator, proceed to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * estimate using statistics. Otherwise punt and return a default constant
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * estimate. Note that calc_rangesel need not handle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * OID_RANGE_ELEM_CONTAINED_OP.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (constrange)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		selec &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;calc_rangesel&lt;/span&gt;(typcache, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;vardata, constrange, operator);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		selec &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;default_range_selectivity&lt;/span&gt;(operator);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;calc_rangesel&lt;/code&gt; is the selectivity calculation function that takes constant values (used above). The &lt;code&gt;else&lt;/code&gt; branch calls &lt;code&gt;default_range_selectivity&lt;/code&gt;, which does not pass any constants.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Returns a default selectivity estimate for given operator, when we don&amp;#39;t
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * have statistics or cannot use them for some reason.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;default_range_selectivity&lt;/span&gt;(Oid operator)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (operator)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; OID_RANGE_CONTAINS_ELEM_OP:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; OID_RANGE_ELEM_CONTAINED_OP:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * &amp;#34;range @&amp;gt; elem&amp;#34; is more or less identical to a scalar
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * inequality &amp;#34;A &amp;gt;= b AND A &amp;lt;= c&amp;#34;.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; DEFAULT_RANGE_INEQ_SEL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The default range selectivity define:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* default selectivity estimate for range inequalities &amp;#34;A &amp;gt; b AND A &amp;lt; c&amp;#34; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define DEFAULT_RANGE_INEQ_SEL	0.005&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s verify this against the production row estimate:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; reltuples::bigint&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;005&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab_202501&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;350&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This matches the actual estimated rows of 8958:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idx_lzltab_202501_created_date &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzltab_202501 cc_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1450&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8958&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So the new execution plan&amp;rsquo;s inaccurate estimate is because the generic plan uses a default selectivity of 0.005.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Why Does the Generic Plan Exist, and the Problem with Soft Parsing
 &lt;div id="why-does-the-generic-plan-exist-and-the-problem-with-soft-parsing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-does-the-generic-plan-exist-and-the-problem-with-soft-parsing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s easier to think of the generic plan as a &amp;ldquo;DEFAULT estimate plan.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Why does the generic plan always seem to have problems?&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s trace the reasoning chain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The generic plan exists to reduce hard parsing, i.e., to enable soft parsing.&lt;/li&gt;
&lt;li&gt;If we don&amp;rsquo;t hard-parse every execution, we can reuse an execution plan without passing specific parameter values.&lt;/li&gt;
&lt;li&gt;If we don&amp;rsquo;t pass parameters and directly use an execution plan, that plan must be generated in advance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Ways to generate an execution plan in advance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A parameter-less execution plan (the generic plan)&lt;/li&gt;
&lt;li&gt;Reuse an execution plan generated from the first few executions with parameters (PostgreSQL doesn&amp;rsquo;t have this)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we use a generic plan, it can be inaccurate, for example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ol&gt;
&lt;li&gt;Data skew (e.g., a particular MCV has a very high frequency, like &lt;code&gt;WHERE a = 1&lt;/code&gt; but &lt;code&gt;a = 1&lt;/code&gt; appears extremely often). This heavily depends on what the parameter value actually is, but the generic plan receives no parameters, so the plan cannot be accurate.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Evenly distributed data where selectivity still cannot be accurately calculated (e.g., &lt;code&gt;WHERE a &amp;gt; $1 AND a &amp;lt; $2&lt;/code&gt;). Without knowing the range, no one can compute the selectivity. The generic plan receives no parameters, so the plan cannot be accurate.&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If we reused plans from the first few parameterized executions (which PostgreSQL doesn&amp;rsquo;t do), they could also be inaccurate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Data skew: the first few parameter values may not be representative, and they would heavily influence what the subsequent fixed plan looks like.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Categories of Generic Plan Estimation Problems
 &lt;div id="categories-of-generic-plan-estimation-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#categories-of-generic-plan-estimation-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Because the comparison requires 5 custom plans first, generic plan problems can be divided into two categories:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The first 5 SQL executions are not representative. This is closely tied to the first 5 execution plans and depends on data skew and whether the first 5 parameter values are representative.&lt;/li&gt;
&lt;li&gt;The generic plan itself is problematic. Due to data skew or the inability to accurately compute selectivity for evenly distributed data, the generic plan itself is inefficient.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Optimization Recommendations
 &lt;div id="optimization-recommendations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#optimization-recommendations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Based on this case, generic plan issues can appear on partitioned tables. The partition key is contiguous, and selectivity when scanning all partitions should be 1, but the generic plan uses 0.005, which can easily lead to a &amp;ldquo;full index scan&amp;rdquo; scenario.&lt;/p&gt;
&lt;p&gt;So during optimization, we need to consider more:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Avoid creating too many indexes that confuse the optimizer.&lt;/li&gt;
&lt;li&gt;Eliminate generic plan interference. Use &lt;code&gt;EXECUTE&lt;/code&gt; to truly run the query 6 times.&lt;/li&gt;
&lt;li&gt;At the session level, set &lt;code&gt;plan_cache_mode = 'force_generic_plan'&lt;/code&gt; or &lt;code&gt;set plan_cache_mode = 'force_custom_plan'&lt;/code&gt; to compare execution plans. Or, on pg16+, use &lt;code&gt;EXPLAIN (GENERIC_PLAN)&lt;/code&gt; to compare.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syntax reference:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--prepare/excute
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; sql1(text) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COUNT&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; LZL &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- run 6 times first
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; sql1(&lt;span style="color:#e6db74"&gt;&amp;#39;zzz&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements &lt;span style="color:#75715e"&gt;-- view prepared statement info, current session only
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Compare execution plans by setting session parameters before EXPLAIN EXECUTE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; plan_cache_mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;force_generic_plan&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; plan_cache_mode&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;force_custom_plan&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Directly view generic plan, pg16+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (GENERIC_PLAN) xx &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content:encoded></item><item><title>Query Conflicts: From a Static Table Conflict to Its Root Cause</title><link>https://lastdba.com/en/2025/09/13/query-conflicts-from-a-static-table-conflict-to-its-root-cause/</link><pubDate>Sat, 13 Sep 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/09/13/query-conflicts-from-a-static-table-conflict-to-its-root-cause/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Symptom
 &lt;div id="the-symptom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-symptom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A static historical table with no updates whatsoever — yet queries on the same-city standby consistently hit query conflicts:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;40001&lt;/span&gt;: canceling &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; due &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; conflict &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; recovery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#66d9ef"&gt;User&lt;/span&gt; query might have needed &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; see &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions that must be removed.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ProcessInterrupts, postgres.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3197&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;30534&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;973&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;535&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Why a Query Conflict on a Static Table Matters
 &lt;div id="why-a-query-conflict-on-a-static-table-matters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-a-query-conflict-on-a-static-table-matters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;My understanding was that a static table should never experience conflicts (this understanding was wrong — I&amp;rsquo;ll explain later).&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Symptom
 &lt;div id="the-symptom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-symptom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A static historical table with no updates whatsoever — yet queries on the same-city standby consistently hit query conflicts:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;40001&lt;/span&gt;: canceling &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; due &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; conflict &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; recovery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#66d9ef"&gt;User&lt;/span&gt; query might have needed &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; see &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions that must be removed.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ProcessInterrupts, postgres.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3197&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;30534&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;973&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;535&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Why a Query Conflict on a Static Table Matters
 &lt;div id="why-a-query-conflict-on-a-static-table-matters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-a-query-conflict-on-a-static-table-matters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;My understanding was that a static table should never experience conflicts (this understanding was wrong — I&amp;rsquo;ll explain later).&lt;/p&gt;
&lt;p&gt;The official documentation lists the conflict cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Access Exclusive locks taken on the primary server, including both explicit &lt;code&gt;LOCK&lt;/code&gt; commands and various DDL actions, conflict with table accesses in standby queries.&lt;/li&gt;
&lt;li&gt;Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.&lt;/li&gt;
&lt;li&gt;Dropping a database on the primary conflicts with sessions connected to that database on the standby.&lt;/li&gt;
&lt;li&gt;Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still &amp;ldquo;see&amp;rdquo; any of the rows to be removed.&lt;/li&gt;
&lt;li&gt;Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;LOCK, DDL, drop tablespace, drop database — definitely none of those.&lt;/p&gt;
&lt;p&gt;Vacuum — none either, confirmed by &lt;code&gt;pg_stat_all_tables.last_autovacuum&lt;/code&gt; and WAL vacuum records.&lt;/p&gt;
&lt;p&gt;The official documentation&amp;rsquo;s explanation stops there. I carefully verified that none of the above applied.&lt;/p&gt;
&lt;p&gt;Extrapolating from existing knowledge, &lt;em&gt;perhaps&lt;/em&gt; other scenarios could kill the xmin held by a standby query&amp;rsquo;s snapshot. For example, in-page pruning removes xmin from rows on a page — if the standby query&amp;rsquo;s snapshot still depends on those xmins, theoretically a conflict could occur. But a page belongs to a specific table, and querying only one table holds only snapshots and xmins on that table. So, &lt;em&gt;theoretically&lt;/em&gt;, in-page pruning on table A &lt;strong&gt;should&lt;/strong&gt; not cause a query conflict on table B (this understanding was also wrong — I&amp;rsquo;ll explain later).&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s official documentation on query conflict scenarios is fairly vague and doesn&amp;rsquo;t explain well why a static table can experience conflicts. Even combining it with my own extrapolations, there shouldn&amp;rsquo;t be a conflict. But I noticed this pattern seemed to exist on many instances, so it was worth investigating.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Root Cause Analysis
 &lt;div id="root-cause-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since the startup process kills the query, checking the startup process&amp;rsquo;s pstack should reveal the conflict function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;212012&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002b283f63d783 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; __select_nocancel () &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt;lib64&lt;span style="color:#f92672"&gt;/&lt;/span&gt;libc.so.&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000008fcf5a &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pg_usleep (microsec&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgsleep.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000787905 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; WaitExceedsMaxStandbyDelay (wait_event_info&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134217762&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;208&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; ResolveRecoveryConflictWithVirtualXIDs (waitlist&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2398a50, reason&lt;span style="color:#f92672"&gt;=&lt;/span&gt;reason&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, wait_event_info&lt;span style="color:#f92672"&gt;=&lt;/span&gt;wait_event_info&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134217762&lt;/span&gt;, report_waiting&lt;span style="color:#f92672"&gt;=&lt;/span&gt;report_waiting&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;276&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000787b33 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ResolveRecoveryConflictWithVirtualXIDs (report_waiting&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;, wait_event_info&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;134217762&lt;/span&gt;, reason&lt;span style="color:#f92672"&gt;=&lt;/span&gt;PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;333&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; ResolveRecoveryConflictWithSnapshot (latestRemovedXid&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, node&lt;span style="color:#f92672"&gt;=&lt;/span&gt;...) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; standby.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;329&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000004c8ffe &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; heap_xlog_clean (record&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2366978) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7764&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; heap2_redo (record&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2366978) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8917&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000519e55 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; StartupXLOG () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; xlog.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7411&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000072f211 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; StartupProcessMain () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; startup.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;204&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000005286b1 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; AuxiliaryProcessMain (argc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argc&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, argv&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argv&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffeb7e39d70) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; bootstrap.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000072c369 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; StartChildProcess (&lt;span style="color:#66d9ef"&gt;type&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;StartupProcess) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; postmaster.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5494&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000072eb54 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; PostmasterMain (argc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argc&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, argv&lt;span style="color:#f92672"&gt;=&lt;/span&gt;argv&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x232edb0) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; postmaster.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1407&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000004892cf &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; main (argc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, argv&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x232edb0) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; main.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;210&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;XLOG_HEAP2_CLEAN
 &lt;div id="xlog_heap2_clean" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xlog_heap2_clean" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;heap2_redo&lt;/span&gt;(XLogReaderState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		info &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogRecGetInfo&lt;/span&gt;(record) &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;~&lt;/span&gt;XLR_INFO_MASK;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; XLOG_HEAP_OPMASK)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; XLOG_HEAP2_CLEAN:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;heap_xlog_clean&lt;/span&gt;(record);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Only when the redo is &lt;code&gt;XLOG_HEAP2_CLEAN&lt;/code&gt; does it enter the next function &lt;code&gt;heap_xlog_clean&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;PG 18 no longer has &lt;code&gt;XLOG_HEAP2_CLEAN&lt;/code&gt; (it was actually removed around PG15 — this article only looks at versions 13 and 18), but the define can still be found in heapam_xlog.h:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//pg13
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_CLEAN		0x10
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_FREEZE_PAGE	0x20
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_CLEANUP_INFO 0x30&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//pg18
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; There&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;s no difference between XLOG_HEAP2_PRUNE_ON_ACCESS,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; XLOG_HEAP2_PRUNE_VACUUM_SCAN and XLOG_HEAP2_PRUNE_VACUUM_CLEANUP records.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; They have separate opcodes just &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; debugging and analysis purposes, to
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; indicate why the WAL record was emitted.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;*/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_PRUNE_ON_ACCESS		0x10
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_PRUNE_VACUUM_SCAN	0x20
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_HEAP2_PRUNE_VACUUM_CLEANUP	0x30&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I pulled out PG18&amp;rsquo;s source because PG13 (our production version) has zero explanation for these CLEAN xl_info macros, making them hard to understand. Since PG18 renamed the macros to something more intuitive and added comments, we can use PG18&amp;rsquo;s source to understand PG13&amp;rsquo;s — to figure out what this WAL record does.&lt;/p&gt;
&lt;p&gt;All three opcodes are fundamentally PRUNE-related WAL records. From the names, PRUNE_ON_ACCESS looks like pruning triggered by access, while the other two are tied to VACUUM operations.&lt;/p&gt;
&lt;p&gt;When checking with &lt;code&gt;pg_waldump&lt;/code&gt;, &lt;code&gt;rmgr: Heap2 CLEAN remxid&lt;/code&gt; records appear every few seconds, with highly varied filenodes and no relation to the static table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pg_waldump &lt;span style="color:#ae81ff"&gt;00000001000012F&lt;/span&gt;E00000001 &lt;span style="color:#f92672"&gt;|&lt;/span&gt;tail &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;egrep &lt;span style="color:#f92672"&gt;-&lt;/span&gt;i heap2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump: fatal: error in WAL record at &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F34F138: invalid resource manager ID &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; at &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F34F168
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 &lt;span style="color:#a6e22e"&gt;len&lt;/span&gt; (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot)&lt;span style="color:#f92672"&gt;:&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3520&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;346&lt;/span&gt;ED0, prev &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;346&lt;/span&gt;EA0, desc: CLEAN remxid &lt;span style="color:#ae81ff"&gt;1983744188&lt;/span&gt;, blkref &lt;span style="color:#960050;background-color:#1e0010"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; rel &lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;88121&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1083807&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;617606&lt;/span&gt; FPW
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 &lt;span style="color:#a6e22e"&gt;len&lt;/span&gt; (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot)&lt;span style="color:#f92672"&gt;:&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;BC60, prev &lt;span style="color:#ae81ff"&gt;12F&lt;/span&gt;E&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0F&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;BC30, desc: CLEAN remxid &lt;span style="color:#ae81ff"&gt;1984090598&lt;/span&gt;, blkref &lt;span style="color:#960050;background-color:#1e0010"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; rel &lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;88121&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;504681&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;1447147&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This matches our symptom pattern: no vacuum activity, but PRUNE is happening, leading into &lt;code&gt;heap_xlog_clean&lt;/code&gt; → &lt;code&gt;ResolveRecoveryConflictWithSnapshot&lt;/code&gt; and the rest of the conflict machinery.&lt;/p&gt;
&lt;p&gt;The PRUNE action producing &lt;code&gt;rmgr: Heap2 CLEAN remxid&lt;/code&gt; WAL records will be demonstrated later via testing.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s finish the source code analysis first.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ResolveRecoveryConflictWithSnapshot
 &lt;div id="resolverecoveryconflictwithsnapshot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#resolverecoveryconflictwithsnapshot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ResolveRecoveryConflictWithSnapshot&lt;/span&gt;(TransactionId latestRemovedXid, RelFileNode node)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	VirtualTransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;backends;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we get passed InvalidTransactionId then we do nothing (no conflict).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * This can happen when replaying already-applied WAL records after a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * standby crash or restart, or when replaying an XLOG_HEAP2_VISIBLE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * record that marks as frozen a page which was already all-visible. It&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * also quite common with records generated during index deletion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * (original execution of the deletion can reason that a recovery conflict
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * which is sufficient for the deletion operation must take place before
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * replay of the deletion record itself).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(latestRemovedXid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	backends &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetConflictingVirtualXIDs&lt;/span&gt;(latestRemovedXid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 node.dbNode);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ResolveRecoveryConflictWithVirtualXIDs&lt;/span&gt;(backends,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 PROCSIG_RECOVERY_CONFLICT_SNAPSHOT,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 WAIT_EVENT_RECOVERY_CONFLICT_SNAPSHOT,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are several types of query conflicts. &lt;code&gt;ResolveRecoveryConflictWithSnapshot&lt;/code&gt; lives up to its name — it&amp;rsquo;s a snapshot conflict.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; finds which backends conflict with the snapshot. &lt;code&gt;ResolveRecoveryConflictWithVirtualXIDs&lt;/code&gt; handles the actual conflict resolution and timeout.&lt;/p&gt;

&lt;h3 class="relative group"&gt;GetConflictingVirtualXIDs
 &lt;div id="getconflictingvirtualxids" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#getconflictingvirtualxids" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; is the key function that determines whether a backend&amp;rsquo;s virtual transaction ID triggers a query conflict. It requires a bit of brainpower.&lt;/p&gt;
&lt;p&gt;Prerequisite knowledge for understanding this function:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;limitXmin&lt;/code&gt; is &lt;code&gt;latestRemovedXid&lt;/code&gt; — the &lt;code&gt;CLEAN remxid&lt;/code&gt; from WAL, the xid that needs to be cleaned up (I read remxid as &amp;ldquo;remove xid&amp;rdquo;). &lt;code&gt;/*limitXmin is supplied as either latestRemovedXid, or InvalidTransactionId*/&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PGPROC&lt;/code&gt; contains current process info: backend id, database id, lock info, and much more&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PGXACT&lt;/code&gt; contains the transaction info for the snapshot held by the current process. It&amp;rsquo;s lighter — the key field is xmin, the lowest xid the current process considers still running&lt;/li&gt;
&lt;li&gt;C&amp;rsquo;s &lt;code&gt;||&lt;/code&gt; rule: if either operand is true (non-zero), the result is true (1)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;TransactionIdIsValid&lt;/code&gt; means &lt;code&gt;xid != 0&lt;/code&gt; — 0 is meaningless&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Key function &lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; explained:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VirtualTransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetConflictingVirtualXIDs&lt;/span&gt;(TransactionId limitXmin, Oid dbOid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (index &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; index &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;numProcs; index&lt;span style="color:#f92672"&gt;++&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// iterate all local processes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pgprocno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pgprocnos[index];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;proc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allProcs[pgprocno]; &lt;span style="color:#75715e"&gt;// process&amp;#39;s PGPROC
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		PGXACT	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pgxact &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allPgXact[pgprocno]; &lt;span style="color:#75715e"&gt;// process&amp;#39;s PGXACT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Exclude prepared transactions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (proc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// prepared transactions have no owning process — can&amp;#39;t handle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;OidIsValid&lt;/span&gt;(dbOid) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#75715e"&gt;// global tables have dbOid=0 which is invalid — satisfies condition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			proc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;databaseId &lt;span style="color:#f92672"&gt;==&lt;/span&gt; dbOid) &lt;span style="color:#75715e"&gt;// only process current database. Cross-db is different — no transaction conflict at all.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Fetch xmin just once - can&amp;#39;t change on us, but good coding */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId pxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;UINT32_ACCESS_ONCE&lt;/span&gt;(pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin); &lt;span style="color:#75715e"&gt;// pgxact-&amp;gt;xmin is the minimum xid of transactions held by this process. UINT32_ACCESS_ONCE is just for atomic access protection — the xmin logic is unchanged
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * We ignore an invalid pxmin because this means that backend has
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * no snapshot currently. We hold a Share lock to avoid contention
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * with users taking snapshots. That is not a problem because the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * current xmin is always at least one higher than the latest
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * removed xid, so any new snapshot would never conflict with the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * test here.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(limitXmin) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#75715e"&gt;// limitXmin=0 possible? At least latestRemovedXid can&amp;#39;t be — I can&amp;#39;t think of a scenario where WAL would log an invalid xid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(pxmin) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdFollows&lt;/span&gt;(pxmin, limitXmin))) &lt;span style="color:#75715e"&gt;// TransactionIdIsValid(pxmin) is also not really needed. !TransactionIdFollows(pxmin, limitXmin) means pxmin &amp;lt;= limitXmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				VirtualTransactionId vxid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;GET_VXID_FROM_PGPROC&lt;/span&gt;(vxid, &lt;span style="color:#f92672"&gt;*&lt;/span&gt;proc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;VirtualTransactionIdIsValid&lt;/span&gt;(vxid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					vxids[count&lt;span style="color:#f92672"&gt;++&lt;/span&gt;] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; vxid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The critical line is &lt;code&gt;!TransactionIdFollows(pxmin, limitXmin)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So the core logic for determining query conflicts is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The primary&amp;rsquo;s cleaned remxid &amp;gt;= the standby query&amp;rsquo;s snapshot-held minimum xid&lt;/strong&gt; → conflict.&lt;/li&gt;
&lt;li&gt;Only kills queries in the current database; global system tables (no database) are killed indiscriminately.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This means: &lt;strong&gt;even if the pruned table on the primary has nothing to do with the table being queried on the standby, a conflict CAN occur!!!&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;In-Page Pruning
 &lt;div id="in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Now that the conflict logic is clear, we still need to understand where the WAL CLEAN records come from. That requires looking at how PRUNE is triggered.&lt;/p&gt;
&lt;p&gt;From &lt;code&gt;README.HOT&lt;/code&gt; on when pruning and defragmentation occur — &amp;ldquo;When can/should we prune or defragment?&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The currently planned heuristic is to prune and defrag when first accessing a page that potentially has prunable tuples&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Prune and defragment are indeed two distinct concepts, but they often happen together.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prune: updating line pointers to shorten HOT chains, but doesn&amp;rsquo;t free space&lt;/li&gt;
&lt;li&gt;Defragment: reclaiming space from dead line pointers and tuples after pruning&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;We cannot prune or defragment unless we can get a &amp;ldquo;buffer cleanup lock&amp;rdquo; on the target page; otherwise, pruning might destroy line pointers that other backends have live references to, and defragmenting might move tuples that other backends have live pointers to&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The page must be under a &amp;ldquo;buffer cleanup lock&amp;rdquo; for prune or defragment to occur.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The worst-case consequence of this is only that an UPDATE cannot be made HOT but has to link to a new tuple version placed on some other page, for lack of centralized space on the original page.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;A typical scenario: a HOT update spills to another page (easy to test).&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;space reclamation happens during tuple retrieval when the page is nearly full (&amp;lt;10% free) and a buffer cleanup lock can be acquired. This means that UPDATE, DELETE, and SELECT can trigger space reclamation, but often not during INSERT &amp;hellip; VALUES because it does not retrieve a row.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;SELECT/UPDATE/DELETE that scan rows can trigger space reclamation. INSERT typically won&amp;rsquo;t, since it doesn&amp;rsquo;t retrieve rows.&lt;/p&gt;
&lt;p&gt;Clearly, after prune or defragment, the corresponding xids should be reclaimed. From the README we can see that HOT updates can reproduce prune/defragment, generating CLEAN WAL records. See [Test: Pure UPDATE Produces In-Page Pruning](## Test: Pure UPDATE Produces In-Page Pruning).&lt;/p&gt;

&lt;h2 class="relative group"&gt;Testing
 &lt;div id="testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The tests below only observe whether conflicts occur, whether CLEAN WAL records appear, or whether page line pointers are updated — without distinguishing prune vs. defragment. In many cases both are triggered together; distinguishing them is tedious and maybe best left for later. The focus here is whether CLEAN WAL records appear.&lt;/p&gt;
&lt;p&gt;Helper SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--sql for test
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--heap_page_items
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags,&lt;span style="color:#66d9ef"&gt;substring&lt;/span&gt;(t_data,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--heap header
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; page_header(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--bt_page_items
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idxlzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--create table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl(a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxlzl &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;md5(random()::text); &lt;span style="color:#75715e"&gt;-- non-hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--force index scan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_seqscan &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_indexonlyscan&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--open an RR transaction to hold a snapshot for observation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ISOLATION&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LEVEL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;REPEATABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Test: Cross-Table Query Conflict
 &lt;div id="test-cross-table-query-conflict" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-cross-table-query-conflict" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;primary&lt;/th&gt;
 &lt;th&gt;standby&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;create table lzl(a bigint primary key);&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;insert into lzl values(1);&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;select 1;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;update lzl set a=2;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;no blocking&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;vacuum lzl;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;#3 ResolveRecoveryConflictWithVirtualXIDs (waitlist=0x277c340, reason=reason@entry=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, wait_event_info=wait_event_info@entry=134217762, report_waiting=report_waiting@entry=true) at standby.c:276&lt;br/&gt;#4 0x0000000000787b33 in ResolveRecoveryConflictWithVirtualXIDs (report_waiting=true, wait_event_info=134217762, reason=PROCSIG_RECOVERY_CONFLICT_SNAPSHOT, waitlist=&lt;optimized out&gt;) at standby.c:333&lt;br/&gt;#5 ResolveRecoveryConflictWithSnapshot (latestRemovedXid=&lt;optimized out&gt;, node=&amp;hellip;) at standby.c:329&lt;br/&gt;#6 0x00000000004c8ffe in heap_xlog_clean (record=0x273a258) at heapam.c:7764&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: As long as a query exists, it has a snapshot, and a snapshot has a snapshot xmin. Even if the queried table is completely unrelated, a query conflict CAN occur.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Vacuum Produces In-Page Pruning
 &lt;div id="test-vacuum-produces-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-vacuum-produces-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Pruning occurs, conflicts occur. Example omitted — not relevant to this case.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: UPDATE Produces In-Page Pruning
 &lt;div id="test-update-produces-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-update-produces-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--HOT, off-page update triggers defragment
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--An 8k heap page stores 4-2xx rows. Here we size rows so 4 fit and remain HOT — the next update spills off-page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl(a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; idxlzl &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--heap page: 4 rows, all HOT:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+----------+----------+-------+----------------------------------------------------------------------------------------------------------+----------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954161&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954162&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954162&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954163&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954163&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954164&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954164&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--index: only one entry:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+---------+-------+------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--One more update triggers off-page update
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--page full, can&amp;#39;t HOT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--HOT chain changed. LP changed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-------------+----------+----------+--------+--------------------------------------------------------------------------------------+----------------+---------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_REDIRECT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954165&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954164&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954165&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--index: still only one entry, unchanged:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+---------+-------+------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The next update doesn&amp;rsquo;t go to a new page — instead, in-page pruning happens first, freeing space on the same page, so the row is written locally. This saves a page access.&lt;/p&gt;
&lt;p&gt;WAL produces CLEAN remxid, confirming that a query conflict can occur:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 62/ 62, tx: 0, lsn: 3DB/F8017348, prev 3DB/F8017310, desc: CLEAN remxid 34954177, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/5893914/5893920 blk 0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 2070/ 2070, tx: 34954178, lsn: 3DB/F8017388, prev 3DB/F8017348, desc: HOT_UPDATE off &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;34954178&lt;/span&gt; flags 0x10 ; new off &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/5893914/5893920 blk 0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: UPDATE statements can produce in-page pruning and can cause query conflicts.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Hint-Bit Writeback Producing In-Page Pruning?
 &lt;div id="test-hint-bit-writeback-producing-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-hint-bit-writeback-producing-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;primary&lt;/th&gt;
 &lt;th&gt;standby&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;wal_log_hints=on&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;truncate table lzl;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;insert into lzl values(&amp;lsquo;z&amp;rsquo;);&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;select * from lzl;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;delete from lzl where a=&amp;lsquo;z&amp;rsquo;;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;checkpoint;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;select * from lzl;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&amp;ndash;WAL contains FPI_FOR_HINT&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&amp;ndash;no query conflict&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Standby pageinspect:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;substring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+----------+----------+-------+------------------------------------------------------------------------------+----------------+-------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954229&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954230&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a202020202020202020202020202020202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: WAL log hints only sync hint bits and don&amp;rsquo;t affect xmin/xmax. No CLEAN or similar records are produced, so hint-bit writeback does NOT cause query conflicts.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: SELECT Produces In-Page Pruning
 &lt;div id="test-select-produces-in-page-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-select-produces-in-page-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;SELECT normally doesn&amp;rsquo;t cause pruning, but it does when the page is nearly full: &lt;a href="https://www.modb.pro/db/1683648157451362304" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/1683648157451362304&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Testing pruning on a full page:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Same table as before, 4 HOT rows, nearly full
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--page at this point:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+----------+----------+-------+----------------------------------------------------------------------------------------------------------+----------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954232&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954233&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954233&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954234&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954234&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954235&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954235&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- A SELECT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--page now shows in-page pruning:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sub
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-------------+----------+--------+--------+---------------------------------------------------------------------------------------+----------------+---------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_REDIRECT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34954235&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x501f00007a20202020202020202020202020&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: SELECT can produce in-page pruning and can cause query conflicts.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Shared Table Cross-Database Query Conflict
 &lt;div id="test-shared-table-cross-database-query-conflict" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-shared-table-cross-database-query-conflict" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Shared tables are global. Earlier in &lt;code&gt;GetConflictingVirtualXIDs&lt;/code&gt; we saw that global tables are killed indiscriminately. Let&amp;rsquo;s test.&lt;/p&gt;
&lt;p&gt;Shared table info:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Source&lt;/span&gt; definition: IsSharedRelation
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Source&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;check&lt;/span&gt;: shared &lt;span style="color:#f92672"&gt;?&lt;/span&gt; InvalidOid : MyDatabaseId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt;: pg_class.relisshared
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Directory: &lt;span style="color:#66d9ef"&gt;global&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Querying &lt;code&gt;pg_class.relisshared&lt;/code&gt; directly is easier:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relkind,relisshared &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relisshared &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;true&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relkind&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;r&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relkind &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relisshared
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------+---------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_authid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_subscription &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_database &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_db_role_setting &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_tablespace &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_auth_members &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_shdepend &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_shdescription &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_replication_origin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_shseclabel &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_authid&lt;/code&gt; stores role/user info. Testing with a password change:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Test: on the primary, in a non-business database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; lzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; password &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;--run several times&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CLEAN remxid appears:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D0F8, prev &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D0B8, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: HOT_UPDATE &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;67&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt; flags &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x20 ; &lt;span style="color:#66d9ef"&gt;new&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, blkref &lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;: rel &lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1260&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: &lt;span style="color:#66d9ef"&gt;Transaction&lt;/span&gt; len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D148, prev &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D0F8, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;680782&lt;/span&gt; CST; inval msgs: catcache &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D1A0, prev &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D148, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: CLEAN remxid &lt;span style="color:#ae81ff"&gt;34954264&lt;/span&gt;, blkref &lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;: rel &lt;span style="color:#ae81ff"&gt;1664&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1260&lt;/span&gt; blk &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap2 len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;34954265&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;DB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;F808D1E0,&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The standby business database&amp;rsquo;s &lt;code&gt;select 1&lt;/code&gt; query was killed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conclusion: Shared tables can cause cross-database query conflicts.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;That said, these shared system tables rarely see heavy updates in normal operations.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Conclusions
 &lt;div id="conclusions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#conclusions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Developer Perspective
 &lt;div id="developer-perspective" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#developer-perspective" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Query conflicts can be completely unrelated to the table being queried — meaning a fully static table CAN experience conflicts.&lt;/p&gt;
&lt;p&gt;Cross-database means different business domains and data. Cross-database does NOT cause query conflicts. The one exception is shared tables, but these are just a handful of system tables that rarely see updates.&lt;/p&gt;
&lt;p&gt;For developers, focus on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Retry on failure&lt;/strong&gt;: Standby queries can be killed — retrying is essential, and retries may succeed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query duration&lt;/strong&gt;: Longer queries are more likely to be killed&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Alternative standbys&lt;/strong&gt;: Consider using a different standby with lower disaster-recovery requirements&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Operations Perspective
 &lt;div id="operations-perspective" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#operations-perspective" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since query conflicts can come from &amp;ldquo;all directions,&amp;rdquo; a simple long-running single-table query can be killed by in-page pruning on a completely different, frequently-updated table. You can increase &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; to reduce conflict probability.&lt;/p&gt;
&lt;p&gt;However, &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; trades off against WAL apply — a longer delay means WAL application is paused. This parameter&amp;rsquo;s value directly represents the maximum possible standby replication lag (it can&amp;rsquo;t cap lag from network or other factors).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Query freshness&lt;/strong&gt;: Prolonged WAL apply pauses mean the standby data lags significantly (WAL may already be on the standby&amp;rsquo;s disk), affecting data freshness requirements for other standby queries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RTO&lt;/strong&gt;: If the primary suffers a disaster and failover is needed, the standby must apply accumulated WAL. If apply delay stretches to hours, it may violate minute-level RTO SLAs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So tuning &lt;code&gt;max_standby_streaming_delay&lt;/code&gt; is a delicate exercise requiring consideration of the standby&amp;rsquo;s role, query freshness requirements, and even geography.&lt;/p&gt;</content:encoded></item><item><title>Parameters on the Control File and Primary-Standby Parameter Mismatch Issues</title><link>https://lastdba.com/en/2025/08/25/parameters-on-the-control-file-and-primary-standby-parameter-mismatch-issues/</link><pubDate>Mon, 25 Aug 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/08/25/parameters-on-the-control-file-and-primary-standby-parameter-mismatch-issues/</guid><description>&lt;h3 class="relative group"&gt;PARAMETER_CHANGE and Database Parameters on the Control File
 &lt;div id="parameter_change-and-database-parameters-on-the-control-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#parameter_change-and-database-parameters-on-the-control-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Some PG parameters affect the standby&amp;rsquo;s operation. These parameters are not only in the configuration file but also written to the control file. Whenever parameters change, they are written to WAL and update the control file.&lt;/p&gt;
&lt;p&gt;The standby redoes the &lt;code&gt;PARAMETER_CHANGE&lt;/code&gt; WAL record and writes to the standby&amp;rsquo;s control file.
&lt;code&gt;PARAMETER_CHANGE&lt;/code&gt; WAL record:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 54/ 54, tx: 0, lsn: 27F/800001C0, prev 27F/80000148, desc: PARAMETER_CHANGE max_connections&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt; max_worker_processes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; max_wal_senders&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; max_prepared_xacts&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; max_locks_per_xact&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; wal_level&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical wal_log_hints&lt;span style="color:#f92672"&gt;=&lt;/span&gt;off track_commit_timestamp&lt;span style="color:#f92672"&gt;=&lt;/span&gt;on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;XLOG_PARAMETER_CHANGE&lt;/code&gt; records these 8 parameters, which can also be viewed directly from the control file:&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;PARAMETER_CHANGE and Database Parameters on the Control File
 &lt;div id="parameter_change-and-database-parameters-on-the-control-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#parameter_change-and-database-parameters-on-the-control-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Some PG parameters affect the standby&amp;rsquo;s operation. These parameters are not only in the configuration file but also written to the control file. Whenever parameters change, they are written to WAL and update the control file.&lt;/p&gt;
&lt;p&gt;The standby redoes the &lt;code&gt;PARAMETER_CHANGE&lt;/code&gt; WAL record and writes to the standby&amp;rsquo;s control file.
&lt;code&gt;PARAMETER_CHANGE&lt;/code&gt; WAL record:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 54/ 54, tx: 0, lsn: 27F/800001C0, prev 27F/80000148, desc: PARAMETER_CHANGE max_connections&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt; max_worker_processes&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; max_wal_senders&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; max_prepared_xacts&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; max_locks_per_xact&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; wal_level&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical wal_log_hints&lt;span style="color:#f92672"&gt;=&lt;/span&gt;off track_commit_timestamp&lt;span style="color:#f92672"&gt;=&lt;/span&gt;on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;XLOG_PARAMETER_CHANGE&lt;/code&gt; records these 8 parameters, which can also be viewed directly from the control file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_controldata |grep setting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_level setting: logical
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_log_hints setting: on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_connections setting: &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_worker_processes setting: &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_wal_senders setting: &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_prepared_xacts setting: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_locks_per_xact setting: &lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;track_commit_timestamp setting: on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;These parameters are all from the primary, even if this control file belongs to the standby.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The startup process checks 6 parameters via the &lt;code&gt;CheckRequiredParameterValues&lt;/code&gt; function. One parameter &lt;code&gt;wal_level&lt;/code&gt; must be &lt;code&gt;&amp;gt;= replica&lt;/code&gt;. The other 5 parameters — &lt;code&gt;max_connections&lt;/code&gt;, &lt;code&gt;max_worker_processes&lt;/code&gt;, &lt;code&gt;max_wal_senders&lt;/code&gt;, &lt;code&gt;max_prepared_transactions&lt;/code&gt;, &lt;code&gt;max_locks_per_transaction&lt;/code&gt; — are checked for primary vs standby sizing. If the standby has a smaller value, recovery is paused. If you increase the primary&amp;rsquo;s parameters directly, the standby will crash. The PG log:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FATAL,&lt;span style="color:#ae81ff"&gt;22023&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;hot standby is not possible because max_connections = 2000 is a lower setting than on the master server (its value was 3000)&amp;#34;&lt;/span&gt;,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;WAL redo at 27F/800001C0 for XLOG/PARAMETER_CHANGE: max_connections=3000 max_worker_processes=20 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=1024 wal_level=logical wal_log_hints=off track_commit_timestamp=on&amp;#34;&lt;/span&gt;,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;startup&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;6 of the 8 parameters can seriously affect standby operation. The other 2 parameters — &lt;code&gt;wal_log_hints&lt;/code&gt;, &lt;code&gt;track_commit_timestamp&lt;/code&gt; — are not immediately checked by the startup process. All 8 parameters being synchronized to the control file serve their own purposes.&lt;/p&gt;

&lt;h3 class="relative group"&gt;wal_log_hints Primary-Standby Mismatch
 &lt;div id="wal_log_hints-primary-standby-mismatch" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal_log_hints-primary-standby-mismatch" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Changes to &lt;code&gt;wal_log_hints&lt;/code&gt; are recorded in WAL logs. Although not checked by the startup process, pg_rewind does check it:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;perform_rewind&lt;/span&gt;(...)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Target cluster need to use checksums or hint bit wal-logging, this to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * prevent from data corruption that could occur because of hint bits.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ControlFile_target.data_checksum_version &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; PG_DATA_CHECKSUM_VERSION &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;ControlFile_target.wal_log_hints)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pg_fatal&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;target server needs to use either data checksums or &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;wal_log_hints = on&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since &lt;code&gt;wal_log_hints&lt;/code&gt; is WAL-related, it doesn&amp;rsquo;t make sense for pg_rewind to check whether the standby&amp;rsquo;s &lt;code&gt;wal_log_hints&lt;/code&gt; is enabled — it should check whether the primary&amp;rsquo;s &lt;code&gt;wal_log_hints&lt;/code&gt; is enabled. Therefore, PG synchronizes the &lt;code&gt;wal_log_hints&lt;/code&gt; parameter to the standby&amp;rsquo;s control file, which is very reasonable.&lt;/p&gt;
&lt;p&gt;wal_log_hints primary-standby mismatch test:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;eee&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- observation point 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- ignore this online checkpoint wal record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1; &lt;span style="color:#75715e"&gt;-- observation point 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- observation action
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump &lt;span style="color:#ae81ff"&gt;000000020000027&lt;/span&gt;F0000000A&lt;span style="color:#f92672"&gt;|&lt;/span&gt;tail &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- observing option
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags,&lt;span style="color:#66d9ef"&gt;substring&lt;/span&gt;(t_data,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;t1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;on, on:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 1:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 85/ 208, tx: 11140182, lsn: 27F/5000CC38, prev 27F/5000CBC0, desc: HOT_UPDATE off &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;11140182&lt;/span&gt; flags 0x10 ; new off &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/7472552/7472597 blk 0 FPW&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Transaction len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 46/ 46, tx: 11140182, lsn: 27F/5000CD08, prev 27F/5000CC38, desc: COMMIT 2025-07-21 18:28:13.292397 CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/5000CD38, prev 27F/5000CD08, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140183&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140182&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140183&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 2:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 51/ 171, tx: 0, lsn: 27F/58000110, prev 27F/580000D8, desc: FPI_FOR_HINT , blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/7472552/7472597 blk 0 FPW&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/580001C0, prev 27F/58000110, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140183&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140182&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140183&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;off, off:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 1:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 85/ 225, tx: 11140183, lsn: 27F/580003C8, prev 27F/58000390, desc: HOT_UPDATE off &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;11140183&lt;/span&gt; flags 0x10 ; new off &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/7472552/7472597 blk 0 FPW&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Transaction len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 46/ 46, tx: 11140183, lsn: 27F/580004B0, prev 27F/580003C8, desc: COMMIT 2025-07-21 18:33:18.192146 CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/580004E0, prev 27F/580004B0, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140184&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140183&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140184&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/58000518, prev 27F/580004E0, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140184&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140183&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140184&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 2:&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;on, off:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 1:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 85/ 274, tx: 11140186, lsn: 27F/58000C18, prev 27F/58000BA0, desc: HOT_UPDATE off &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;11140186&lt;/span&gt; flags 0x10 ; new off &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/7472552/7472597 blk 0 FPW&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Transaction len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 46/ 46, tx: 11140186, lsn: 27F/58000D30, prev 27F/58000C18, desc: COMMIT 2025-07-21 18:40:17.638691 CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/58000D60, prev 27F/58000D30, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140186&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/58000D98, prev 27F/58000D60, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140186&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 2:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 51/ 236, tx: 0, lsn: 27F/58000E48, prev 27F/58000DD0, desc: FPI_FOR_HINT , blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/7472552/7472597 blk 0 FPW&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/58000F38, prev 27F/58000E48, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140186&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;off, on:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 1:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/58001108, prev 27F/58001090, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140186&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 85/ 289, tx: 11140187, lsn: 27F/58001140, prev 27F/58001108, desc: HOT_UPDATE off &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt; flags 0x10 ; new off &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/7472552/7472597 blk 0 FPW&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Transaction len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 46/ 46, tx: 11140187, lsn: 27F/58001268, prev 27F/58001140, desc: COMMIT 2025-07-21 18:44:08.550109 CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 54/ 54, tx: 0, lsn: 27F/58001298, prev 27F/58001268, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140188&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140186&lt;/span&gt; oldestRunningXid 11140187; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; xacts: &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 27F/580012D0, prev 27F/58001298, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;11140188&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;11140187&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;11140188&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Observation point 2:&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Test summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FPI_FOR_HINT is produced when hint bits are written back; SELECT queries can produce FPI_FOR_HINT.&lt;/li&gt;
&lt;li&gt;Regardless of the standby setting (on or off), when the primary is on, FPI_FOR_HINT will be produced.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Additional Knowledge: What is XLOG_RUNNING_XACTS
 &lt;div id="additional-knowledge-what-is-xlog_running_xacts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#additional-knowledge-what-is-xlog_running_xacts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;XLOG_RUNNING_XACTS is one type of RM_STANDBY_ID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * XLOG message types
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_STANDBY_LOCK			0x00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_RUNNING_XACTS			0x10
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLOG_INVALIDATIONS			0x20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;XLOG_STANDBY_LOCK&lt;/code&gt;: Records acquisition and release of AccessExclusiveLock, used by standby nodes to recognize lock states.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;XLOG_RUNNING_XACTS&lt;/code&gt;: Running-xacts snapshots used for building snapshots to ensure transaction consistency.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;XLOG_INVALIDATIONS&lt;/code&gt;: INVALIDATIONS messages for synchronizing metadata information to local backends.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; standbydefs.h
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 Frontend exposed definitions &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; hot standby mode.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RM_STANDBY_ID&lt;/code&gt; is an rmgr specifically defined for hot standby read-only standbys. For local instance recovery and logical decoding scenarios that need WAL, &lt;code&gt;RM_STANDBY_ID&lt;/code&gt; is essentially meaningless to them.&lt;/p&gt;
&lt;p&gt;Observing WAL records during transaction commit:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;command&lt;/th&gt;
 &lt;th&gt;wal record&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;begin;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;select * from txid_current(); &amp;ndash;11140191&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;commit;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;rmgr: Transaction&lt;/strong&gt; len (rec/tot): 46/ 46, &lt;strong&gt;tx: 11140191&lt;/strong&gt;, lsn: 27F/80000538, prev 27F/80000500, desc: COMMIT 2025-07-23 11:16:10.872724 CST&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;rmgr: Standby&lt;/strong&gt; len (rec/tot): 50/ 50, tx: 0, lsn: 27F/80000568, prev 27F/80000538, desc: RUNNING_XACTS &lt;strong&gt;nextXid 11140192 latestCompletedXid 11140191&lt;/strong&gt; oldestRunningXid 11140192&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The transaction ID itself — commit or abort — is synchronized by rmgr: Transaction. Snapshots are synchronized via rmgr: Standby RUNNING_XACTS.&lt;/p&gt;

&lt;h3 class="relative group"&gt;track_commit_timestamp Primary-Standby Mismatch
 &lt;div id="track_commit_timestamp-primary-standby-mismatch" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#track_commit_timestamp-primary-standby-mismatch" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;track_commit_timestamp&lt;/code&gt;: the startup process activates the standby&amp;rsquo;s commit_ts functionality upon receiving the corresponding WAL, primarily for viewing xid commit times on the standby:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Activate or deactivate CommitTs&amp;#39; upon reception of a XLOG_PARAMETER_CHANGE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * XLog record during recovery.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CommitTsParameterChange&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; newvalue, &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; oldvalue)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If the commit_ts module is disabled in this server and we get word from
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the primary server that it is enabled there, activate it so that we can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * replay future WAL records involving it; also mark it as active on
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * pg_control. If the old value was already set, we already did this, so
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * don&amp;#39;t do anything.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If the module is disabled in the primary, disable it here too, unless
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the module is enabled locally.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Note this only runs in the recovery process, so an unlocked read is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * fine.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (newvalue)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;commitTsShared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;commitTsActive)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ActivateCommitTs&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (commitTsShared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;commitTsActive)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;DeactivateCommitTs&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;track_commit_timestamp primary-standby mismatch test:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Initial state: primary=on, standby=on. Both can use &lt;code&gt;committed_xact&lt;/code&gt; and similar functions.&lt;/li&gt;
&lt;li&gt;primary=off (restart primary), standby=on (no change). Both cannot use &lt;code&gt;committed_xact&lt;/code&gt; and similar functions.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;After modifying and restarting the primary, standby replication remains normal, but &lt;code&gt;committed_xact&lt;/code&gt; and similar functions are unusable:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_last_committed_xact();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;: could &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;get&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Make sure the configuration &lt;span style="color:#66d9ef"&gt;parameter&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;track_commit_timestamp&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; server.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: error_commit_ts_disabled, commit_ts.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;385&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; track_commit_timestamp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; track_commit_timestamp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;q
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; pg_controldata &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep track_commit_timestamp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;track_commit_timestamp setting: &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;PG14+ Pause Recovery
 &lt;div id="pg14-pause-recovery" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg14-pause-recovery" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG14 improved the behavior when primary parameter changes cause standby crashes. When parameters don&amp;rsquo;t meet conditions, instead of the read-only standby directly crashing, it now only pauses replication. See &lt;code&gt;RecoveryRequiresIntParameter&lt;/code&gt;.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Pause recovery on a hot standby server if the primary changes its parameters in a way that prevents replay on the standby (Peter Eisentraut)&lt;/p&gt;
&lt;p&gt;Previously the standby would shut down immediately&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Testing PG14 parameter changes causing standby replication interruption:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;337&lt;/span&gt; CST,,,&lt;span style="color:#ae81ff"&gt;141823&lt;/span&gt;,,&lt;span style="color:#ae81ff"&gt;6880&lt;/span&gt;ca5f.&lt;span style="color:#ae81ff"&gt;229&lt;/span&gt;ff,&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;,,&lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;recovery has paused&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;If recovery is unpaused, the server will shut down.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;You can then restart the server after making the necessary configuration changes.&amp;#34;&lt;/span&gt;,,,&lt;span style="color:#e6db74"&gt;&amp;#34;WAL redo at 281/78324BE8 for XLOG/PARAMETER_CHANGE: max_connections=2000 max_worker_processes=20 max_wal_senders=10 max_prepared_xacts=0 max_locks_per_xact=1024 wal_level=logical wal_log_hints=on track_commit_timestamp=on&amp;#34;&lt;/span&gt;,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;startup&amp;#34;&lt;/span&gt;,,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since replication has already stopped, changing the primary&amp;rsquo;s parameters back won&amp;rsquo;t help — the standby can&amp;rsquo;t apply subsequent changes and update the control file. So you &lt;em&gt;must&lt;/em&gt; modify the standby&amp;rsquo;s parameters and restart (the log hint is also quite clear).&lt;/p&gt;

&lt;h3 class="relative group"&gt;Summary of the 8 Parameters
 &lt;div id="summary-of-the-8-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary-of-the-8-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When any of the 8 parameters are modified on the primary and the primary is restarted, the local control file is updated. If parameters have changed, the updated parameters are written to WAL and synchronized to downstream. The downstream redoes this PARAMETER_CHANGE WAL record, updating its local control file. The standby then determines whether primary-standby replication or other functions are available based on certain conditions.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;8 Parameters Written to Control File&lt;/th&gt;
 &lt;th&gt;Check&lt;/th&gt;
 &lt;th&gt;If not, standby (PG13-)&lt;/th&gt;
 &lt;th&gt;If not, standby (PG14+)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;wal_level&lt;/td&gt;
 &lt;td&gt;!=minimal&lt;/td&gt;
 &lt;td&gt;Cannot sync, fundamental&lt;/td&gt;
 &lt;td&gt;Cannot sync, fundamental&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_connections&lt;/td&gt;
 &lt;td&gt;primary &amp;lt;= standby&lt;/td&gt;
 &lt;td&gt;hot standby shutdown&lt;/td&gt;
 &lt;td&gt;hot standby pause replication&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_worker_processes&lt;/td&gt;
 &lt;td&gt;primary &amp;lt;= standby&lt;/td&gt;
 &lt;td&gt;hot standby shutdown&lt;/td&gt;
 &lt;td&gt;hot standby pause replication&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_wal_senders&lt;/td&gt;
 &lt;td&gt;primary &amp;lt;= standby&lt;/td&gt;
 &lt;td&gt;hot standby shutdown&lt;/td&gt;
 &lt;td&gt;hot standby pause replication&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_prepared_transactions&lt;/td&gt;
 &lt;td&gt;primary &amp;lt;= standby&lt;/td&gt;
 &lt;td&gt;hot standby shutdown&lt;/td&gt;
 &lt;td&gt;hot standby pause replication&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_locks_per_transaction&lt;/td&gt;
 &lt;td&gt;primary &amp;lt;= standby&lt;/td&gt;
 &lt;td&gt;hot standby shutdown&lt;/td&gt;
 &lt;td&gt;hot standby pause replication&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;wal_log_hints&lt;/td&gt;
 &lt;td&gt;pg_rewind prerequisite (either data checksums or wal_log_hints = on)&lt;/td&gt;
 &lt;td&gt;Doesn&amp;rsquo;t affect standby sync&lt;/td&gt;
 &lt;td&gt;Doesn&amp;rsquo;t affect standby sync&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;track_commit_timestamp&lt;/td&gt;
 &lt;td&gt;Enable/disable standby commit_ts functionality&lt;/td&gt;
 &lt;td&gt;Doesn&amp;rsquo;t affect standby sync&lt;/td&gt;
 &lt;td&gt;Doesn&amp;rsquo;t affect standby sync&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;p&gt;Special thanks to: Gao Changjun&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL DDL Pitfalls and Clever Solutions</title><link>https://lastdba.com/en/2025/07/19/postgresql-ddl-pitfalls-and-clever-solutions/</link><pubDate>Sat, 19 Jul 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/07/19/postgresql-ddl-pitfalls-and-clever-solutions/</guid><description>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5f610ac9b703.png" alt="DDL Pitfalls and Solutions" /&gt;&lt;/p&gt;
&lt;p&gt;Save it, use it freely, no need to ask.&lt;/p&gt;
&lt;p&gt;May be updated, may not be.&lt;/p&gt;
&lt;p&gt;Feedback welcome — pick it apart if you can.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</description><content:encoded>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5f610ac9b703.png" alt="DDL Pitfalls and Solutions" /&gt;&lt;/p&gt;
&lt;p&gt;Save it, use it freely, no need to ask.&lt;/p&gt;
&lt;p&gt;May be updated, may not be.&lt;/p&gt;
&lt;p&gt;Feedback welcome — pick it apart if you can.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>Case: GRANT Authorization Causes Walsender to Hang</title><link>https://lastdba.com/en/2025/06/26/case-grant-authorization-causes-walsender-to-hang/</link><pubDate>Thu, 26 Jun 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/06/26/case-grant-authorization-causes-walsender-to-hang/</guid><description>&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The walsender&amp;rsquo;s LSN stopped advancing. The stack trace showed it was stuck in pathman&amp;rsquo;s &lt;code&gt;invalidate_psin_entries_using_relid&lt;/code&gt;, with the relid constantly changing and the walsender CPU pegged at 100%.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=42319501) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=42319501) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=42319501) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x3a391c8) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x000000000071d50e in ReorderBufferExecuteInvalidations (rb=0x1b63ff8, txn=0x1be5f58, txn=0x1be5f58) at reorderbuffer.c:2238
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 ReorderBufferCommit (rb=0x1b63ff8, xid=xid@entry=4285897514, commit_lsn=405674661986920, end_lsn=&amp;lt;optimized out&amp;gt;, commit_time=commit_time@entry=799377897828299, origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at reorderbuffer.c:1819
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x0000000000712d18 in DecodeCommit (xid=4285897514, parsed=0x7fffaadf8630, buf=0x7fffaadf87f0, ctx=0x1a359e8) at decode.c:637
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 DecodeXactOp (ctx=0x1a359e8, buf=buf@entry=0x7fffaadf87f0) at decode.c:245
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000007130b2 in LogicalDecodingProcessRecord (ctx=0x1a359e8, record=0x1a35c80) at decode.c:114
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x0000000000733662 in XLogSendLogical () at walsender.c:2885
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 0x0000000000735942 in WalSndLoop (send_data=send_data@entry=0x733620 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2287
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x0000000000736692 in StartLogicalReplication (cmd=0x1846c68) at walsender.c:1213
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 exec_replication_command (cmd_string=cmd_string@entry=0x181a288 &amp;#34;START_REPLICATION SLOT \&amp;#34;lzl_logical_rep\&amp;#34; LOGICAL 170F5/7C3EAE78 (\&amp;#34;proto_version\&amp;#34; &amp;#39;1&amp;#39;, \&amp;#34;publication_names\&amp;#34; &amp;#39;lzl_logical_rep&amp;#39;)&amp;#34;) at walsender.c:1640
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x0000000000774e91 in PostgresMain (argc=&amp;lt;optimized out&amp;gt;, argv=argv@entry=0x1866478, dbname=0x18662b8 &amp;#34;lzldb&amp;#34;, username=&amp;lt;optimized out&amp;gt;) at postgres.c:4325
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 0x0000000000485989 in BackendRun (port=&amp;lt;optimized out&amp;gt;, port=&amp;lt;optimized out&amp;gt;) at postmaster.c:4526
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 BackendStartup (port=0x18635b0) at postmaster.c:4210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 ServerLoop () at postmaster.c:1739
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#18 0x0000000000702f08 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1814da0) at postmaster.c:1412
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#19 0x000000000048660a in main (argc=3, argv=0x1814da0) at main.c:210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Second execution, same stack, different relid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=26560221) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=26560221) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=26560221) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x39f1f68) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The changing relid showed that the walsender was still running, not dead. The LSN was not advancing, so we analyzed the LSN position to see what the transaction was doing.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Symptoms
 &lt;div id="symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The walsender&amp;rsquo;s LSN stopped advancing. The stack trace showed it was stuck in pathman&amp;rsquo;s &lt;code&gt;invalidate_psin_entries_using_relid&lt;/code&gt;, with the relid constantly changing and the walsender CPU pegged at 100%.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=42319501) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=42319501) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=42319501) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x3a391c8) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x000000000071d50e in ReorderBufferExecuteInvalidations (rb=0x1b63ff8, txn=0x1be5f58, txn=0x1be5f58) at reorderbuffer.c:2238
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 ReorderBufferCommit (rb=0x1b63ff8, xid=xid@entry=4285897514, commit_lsn=405674661986920, end_lsn=&amp;lt;optimized out&amp;gt;, commit_time=commit_time@entry=799377897828299, origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at reorderbuffer.c:1819
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x0000000000712d18 in DecodeCommit (xid=4285897514, parsed=0x7fffaadf8630, buf=0x7fffaadf87f0, ctx=0x1a359e8) at decode.c:637
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 DecodeXactOp (ctx=0x1a359e8, buf=buf@entry=0x7fffaadf87f0) at decode.c:245
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000007130b2 in LogicalDecodingProcessRecord (ctx=0x1a359e8, record=0x1a35c80) at decode.c:114
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x0000000000733662 in XLogSendLogical () at walsender.c:2885
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 0x0000000000735942 in WalSndLoop (send_data=send_data@entry=0x733620 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2287
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x0000000000736692 in StartLogicalReplication (cmd=0x1846c68) at walsender.c:1213
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 exec_replication_command (cmd_string=cmd_string@entry=0x181a288 &amp;#34;START_REPLICATION SLOT \&amp;#34;lzl_logical_rep\&amp;#34; LOGICAL 170F5/7C3EAE78 (\&amp;#34;proto_version\&amp;#34; &amp;#39;1&amp;#39;, \&amp;#34;publication_names\&amp;#34; &amp;#39;lzl_logical_rep&amp;#39;)&amp;#34;) at walsender.c:1640
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x0000000000774e91 in PostgresMain (argc=&amp;lt;optimized out&amp;gt;, argv=argv@entry=0x1866478, dbname=0x18662b8 &amp;#34;lzldb&amp;#34;, username=&amp;lt;optimized out&amp;gt;) at postgres.c:4325
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 0x0000000000485989 in BackendRun (port=&amp;lt;optimized out&amp;gt;, port=&amp;lt;optimized out&amp;gt;) at postmaster.c:4526
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 BackendStartup (port=0x18635b0) at postmaster.c:4210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 ServerLoop () at postmaster.c:1739
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#18 0x0000000000702f08 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1814da0) at postmaster.c:1412
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#19 0x000000000048660a in main (argc=3, argv=0x1814da0) at main.c:210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Second execution, same stack, different relid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;121327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 hash_seq_search (status=status@entry=0x7fffaadf8330) at dynahash.c:1441
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002ba3b40ec728 in invalidate_psin_entries_using_relid (relid=relid@entry=26560221) at src/relation_info.c:251
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002ba3b40ecb3d in forget_status_of_relation (relid=relid@entry=26560221) at src/relation_info.c:232
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00002ba3b40fcc96 in pathman_relcache_hook (arg=&amp;lt;optimized out&amp;gt;, relid=26560221) at src/hooks.c:934
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000087168a in LocalExecuteInvalidationMessage (msg=0x39f1f68) at inval.c:595
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The changing relid showed that the walsender was still running, not dead. The LSN was not advancing, so we analyzed the LSN position to see what the transaction was doing.&lt;/p&gt;
&lt;p&gt;If the slot information was still available, we could look up the restart LSN via the slot view to find the WAL position. If not, we could use the LSN from the stack trace to identify the WAL log.&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;pg_waldump&lt;/code&gt; to inspect WAL log entries, filtering by xid:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Heap len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 961/ 961, tx: 4285897514, lsn: 170F5/7DFE3470, prev 170F5/7DFE3430, desc: UPDATE+INIT off &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; xmax &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; flags 0x00 ; new off &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; xmax 0, blkref &lt;span style="color:#75715e"&gt;#0: rel 1663/17662/1259 blk 8443, blkref #1: rel 1663/17662/1259 blk 7327&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Transaction len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 1778325/1778325, tx: 4285897514, lsn: 170F5/7E1F4268, prev 170F5/7E1F4220, desc: COMMIT 2025-05-01 09:24:57.828299 CST; inval msgs: catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; catcache &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813261&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813255&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;51030741&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813252&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;50737247&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813246&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813243&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813237&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;50737241&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813234&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813224&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;49379811&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813216&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;48813210&lt;/span&gt; relcache &lt;span style="color:#ae81ff"&gt;45452775&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The transaction for &lt;code&gt;rel 1663/17662/1259&lt;/code&gt; had 180,000 records. The last record was inval msgs: ~70,000 catcache entries and ~30,000 relcache entries.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rel 1663/17662/1259&lt;/code&gt; is &lt;code&gt;pg_class&lt;/code&gt;. Querying by xmin reveals the affected tables and commit time:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,pg_xact_commit_timestamp(xmin),relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; xmin&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;4285897514&amp;#39;&lt;/span&gt;::xid &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_xact_commit_timestamp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------+-------------------------------+---------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; v$session
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tmp_20230801_id_seq
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tmp_20230801
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_param
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4285897514&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828299&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_20240105
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; xmin&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;4285897514&amp;#39;&lt;/span&gt;::xid ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;18523&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;139138&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checking the pglog by timestamp:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2025-05-01 09:24:59.837 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,61418,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,6812cd65.efea,3,&lt;span style="color:#e6db74"&gt;&amp;#34;DO&amp;#34;&lt;/span&gt;,2025-05-01 09:24:53 CST,549/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 6036.275 ms statement: 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; EXECUTE &amp;#39;GRANT SELECT ON ALL TABLES IN SCHEMA public TO r_lzldbdata_qry&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; END;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; &lt;/span&gt;$$&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;psql&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can basically confirm that the GRANT operation was the culprit. GRANT updates &lt;code&gt;relacl&lt;/code&gt; in &lt;code&gt;pg_class&lt;/code&gt;, and at least 18,000 relations had their permissions updated. Updates to &lt;code&gt;pg_class&lt;/code&gt; trigger invalidation messages, and the massive number of invalidation messages were being processed slowly in the walsender process.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Reproduction
 &lt;div id="reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create a logical replication slot, any kind will do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical &lt;span style="color:#f92672"&gt;-&lt;/span&gt;h &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;p &lt;span style="color:#ae81ff"&gt;7997&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;U repuser &lt;span style="color:#75715e"&gt;--slot=logical_test --start -f recv.sql &amp;amp;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create many tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;20000&lt;/span&gt; LOOP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; format(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;CREATE TABLE IF NOT EXISTS table_%s ( 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; col1 varchar(10)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; )&amp;#39;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lpad(i::text, &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#75715e"&gt;-- Generate 5-digit numbered table names
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; LOOP;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Single GRANT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; tables &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldb_qry;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Perfectly reproduced
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost:&lt;span style="color:#f92672"&gt;~/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; hash_seq_search (status&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffd664be280) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; dynahash.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1444&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235e728 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; invalidate_psin_entries_using_relid (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1002857&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;251&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235eb3d &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; forget_status_of_relation (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1002857&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31236ec96 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pathman_relcache_hook (arg&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1002857&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;hooks.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;934&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000087168a &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; LocalExecuteInvalidationMessage (msg&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2ad3c3f61a88) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; inval.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;595&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000071d50e &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ReorderBufferExecuteInvalidations (rb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x17e5698, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x180d698, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x180d698) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; reorderbuffer.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2238&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost:&lt;span style="color:#f92672"&gt;~/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000891d0c &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; hash_seq_search (status&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffd664be280) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; dynahash.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1441&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235e728 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; invalidate_psin_entries_using_relid (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1011110&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;251&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31235eb3d &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; forget_status_of_relation (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1011110&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;relation_info.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00002ad31236ec96 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pathman_relcache_hook (arg&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1011110&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;hooks.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;934&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- relid keeps changing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- CPU pegged at 100%:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ps &lt;span style="color:#f92672"&gt;-&lt;/span&gt;eo pid,&lt;span style="color:#f92672"&gt;%&lt;/span&gt;cpu,&lt;span style="color:#f92672"&gt;%&lt;/span&gt;mem&lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;172862&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Takes about 2 hours to catch up&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Accelerating Walsender by Removing Pathman
 &lt;div id="accelerating-walsender-by-removing-pathman" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#accelerating-walsender-by-removing-pathman" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since the database wasn&amp;rsquo;t actually using pathman partitioned tables but had the extension installed, we tried bypassing the pathman hook to speed up walsender processing.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; extension pg_pathman;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; tables &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldb_upd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost&lt;span style="color:#f92672"&gt;~/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;133460&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; hash_seq_search (status&lt;span style="color:#f92672"&gt;=&lt;/span&gt;status&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x7ffe292d5c90) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; dynahash.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1418&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000087f228 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; RelfilenodeMapInvalidateCallback (arg&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1034036&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; relfilenodemap.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000087168a &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; LocalExecuteInvalidationMessage (msg&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2b9699795768) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; inval.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;595&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000071d50e &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ReorderBufferExecuteInvalidations (rb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x195a358, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1a6ff38, txn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1a6ff38) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; reorderbuffer.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2238&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; ReorderBufferCommit (rb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x195a358, xid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;xid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;328684387&lt;/span&gt;, commit_lsn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8016890875224&lt;/span&gt;, end_lsn&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, commit_time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;commit_time&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;799851538975691&lt;/span&gt;, origin_id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;origin_id&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, origin_lsn&lt;span style="color:#f92672"&gt;=&lt;/span&gt;origin_lsn&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; reorderbuffer.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1819&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; Completed within &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; seconds&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Even without commenting out &lt;code&gt;pg_pathman&lt;/code&gt; from &lt;code&gt;shared_preload_libraries&lt;/code&gt;, there was a dramatic improvement — walsender went from 2 hours to 20 seconds.&lt;/p&gt;
&lt;p&gt;This seemed odd at first — without commenting &lt;code&gt;shared_preload_libraries&lt;/code&gt;, the hook should still run. Source analysis revealed the reason: the very first step of the hook checks for the pathman config table; if it doesn&amp;rsquo;t exist, it skips pathman&amp;rsquo;s invalidation logic entirely, so execution completes quickly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Invalidate PartRelationInfo cache entry if needed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pathman_relcache_hook&lt;/span&gt;(Datum arg, Oid relid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid pathman_config_relid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* See cook_partitioning_expression() */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;pathman_hooks_enabled)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;IsPathmanReady&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Invalidation event for PATHMAN_CONFIG table (probably DROP EXTENSION).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Digging catalogs here is expensive and probably illegal, so we take
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * cached relid. It is possible that we don&amp;#39;t know it atm (e.g. pathman
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * was disabled). However, in this case caches must have been cleaned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * on disable, and there is no DROP-specific additional actions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	pathman_config_relid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;get_pathman_config_relid&lt;/span&gt;(true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; pathman_config_relid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;delay_pathman_shutdown&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Invalidation event for some user table */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relid &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; FirstNormalObjectId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Invalidate PartBoundInfo entry if needed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;forget_bounds_of_rel&lt;/span&gt;(relid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Invalidate PartStatusInfo entry if needed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;forget_status_of_relation&lt;/span&gt;(relid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Invalidate PartParentInfo entry if needed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;forget_parent_of_partition&lt;/span&gt;(relid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;get_pathman_config_relid&lt;/code&gt; fetches the pathman_config table. &lt;code&gt;drop extension pg_pathman&lt;/code&gt; removes the pathman_config table from the database, so the source code never enters the &lt;code&gt;forget_*&lt;/code&gt; logic.&lt;/p&gt;
&lt;p&gt;There are other ways to accelerate walsender processing: setting &lt;code&gt;pg_pathman.enable=off&lt;/code&gt; causes &lt;code&gt;IsPathmanReady()&lt;/code&gt; to return false and bail out immediately. Or, most directly, comment out &lt;code&gt;pg_pathman&lt;/code&gt; from &lt;code&gt;shared_preload_libraries&lt;/code&gt; and restart the instance (this is instance-level, not database-level).&lt;/p&gt;

&lt;h2 class="relative group"&gt;Improvements in PG14
 &lt;div id="improvements-in-pg14" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#improvements-in-pg14" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PG14.0 release notes:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Allow logical decoding to more efficiently process cache invalidation messages (Dilip Kumar)
This allows logical decoding to work efficiently in presence of a large amount of DDL.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/release/14.0/" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/release/14.0/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Patch:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71" target="_blank" rel="noreferrer"&gt;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Comment from PG14&amp;rsquo;s &lt;code&gt;ReorderBufferAddInvalidations&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;We require to record it in form of the change so that we can execute only the required invalidations instead of executing all the invalidations on each CommandId increment.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Comparing PG14 vs PG13, &lt;code&gt;ReorderBufferCommit&lt;/code&gt; underwent a major rewrite.&lt;/p&gt;
&lt;p&gt;In PG13, transaction processing logic was directly in the &lt;code&gt;ReorderBufferCommit&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.command_id &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; InvalidCommandId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (command_id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.command_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						command_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.command_id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;snapshot_now&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;copied)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#75715e"&gt;/* we don&amp;#39;t use the global one anymore */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							snapshot_now &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReorderBufferCopySnap&lt;/span&gt;(rb, snapshot_now,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;																txn, command_id);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						snapshot_now&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; command_id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;TeardownHistoricSnapshot&lt;/span&gt;(false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;SetupHistoricSnapshot&lt;/span&gt;(snapshot_now, txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tuplecid_hash);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 * Every time the CommandId is incremented, we could
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 * see new catalog contents, so execute all
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 * invalidations.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;ReorderBufferExecuteInvalidations&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In PG14, the main logic moved to &lt;code&gt;ReorderBufferReplay&lt;/code&gt; -&amp;gt; &lt;code&gt;ReorderBufferProcessTXN&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ReorderBufferProcessTXN&lt;/code&gt; introduced a new &lt;code&gt;case REORDER_BUFFER_CHANGE_INVALIDATION&lt;/code&gt; branch to execute invalidations from the reorder buffer:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; REORDER_BUFFER_CHANGE_INVALIDATION:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#75715e"&gt;/* Execute the invalidation messages locally */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;ReorderBufferExecuteInvalidations&lt;/span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.inval.ninvalidations,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 change&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.inval.invalidations);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The logic after &lt;code&gt;ReorderBufferExecuteInvalidations&lt;/code&gt; is largely the same. The main differences between PG13 and PG14&amp;rsquo;s &lt;code&gt;ReorderBufferCommit&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ReorderBufferCommit&lt;/code&gt; is no longer the primary transaction processing function; the call stack is deeper&lt;/li&gt;
&lt;li&gt;A new &lt;code&gt;case REORDER_BUFFER_CHANGE_INVALIDATION&lt;/code&gt; branch was added, separated from &lt;code&gt;REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID&lt;/code&gt;, to handle invalidations independently&lt;/li&gt;
&lt;li&gt;The per-command_id invalidation processing logic was removed&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Root Cause and Solutions
 &lt;div id="root-cause-and-solutions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause-and-solutions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The root cause of the walsender hang was a bulk GRANT operation that updated many rows in &lt;code&gt;pg_class&lt;/code&gt;, triggering a massive number of invalidation messages. A statement like &lt;code&gt;GRANT privs ON ALL TABLES IN SCHEMA public TO role1&lt;/code&gt; executes as multiple commands within a single transaction in PostgreSQL. In PG13, logical replication processes invalidation messages per-command, invoking each hook&amp;rsquo;s inval hash table processing. In this scenario, pathman&amp;rsquo;s hook was particularly slow at processing the inval hash table, causing replication lag.&lt;/p&gt;
&lt;p&gt;Conditions for pathman-induced slowness (all must apply):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG13 or earlier&lt;/li&gt;
&lt;li&gt;Bulk GRANT&lt;/li&gt;
&lt;li&gt;pathman extension installed (whether used or not)&lt;/li&gt;
&lt;li&gt;Logical replication slot active&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even after removing pathman, significant CPU time was still spent in functions like &lt;code&gt;RelfilenodeMapInvalidateCallback&lt;/code&gt;. In PG13 testing, the processing time difference between with and without pathman was hours vs. minutes.&lt;/p&gt;
&lt;p&gt;Other untested but community-mentioned scenarios (all must apply):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG13 or earlier&lt;/li&gt;
&lt;li&gt;Bulk DDL / TRUNCATE / DCL / DROP PUBLICATION&lt;/li&gt;
&lt;li&gt;Logical replication slot active&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Short-term fix: If pathman tables are not in use, drop the extension or unload the pathman shared library; restart the replication slot.&lt;/p&gt;
&lt;p&gt;Long-term fix: Upgrade to PG14+ (tested — extremely fast with no lag).&lt;/p&gt;

&lt;h3 class="relative group"&gt;
 &lt;div id="" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/flat/17716-1fe42e7b44fc2f25%40postgresql.org" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/flat/17716-1fe42e7b44fc2f25%40postgresql.org&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71" target="_blank" rel="noreferrer"&gt;https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d7eb52d71&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Linux Memory Advanced</title><link>https://lastdba.com/en/2025/06/19/linux-memory-advanced/</link><pubDate>Thu, 19 Jun 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/06/19/linux-memory-advanced/</guid><description>&lt;p&gt;(For memory basics, refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;Linux Memory Analysis&lt;/a&gt;; this article covers memory knowledge above that foundation)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Basic Concepts
 &lt;div id="memory-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;buddy
 &lt;div id="buddy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buddy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process of buddy system allocating and merging pages is omitted.&lt;/p&gt;
&lt;p&gt;Easily overlooked knowledge points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The prerequisite for buddy merging two blocks of the same size is that their &lt;strong&gt;physical addresses are contiguous&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The merge algorithm is iterative: after merging at the current level, it will automatically attempt to merge larger blocks. This means compactd is not strictly required for merging&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;page table &amp;amp; PTE
 &lt;div id="page-table--pte" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-table--pte" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;page table and PTE are actually two different concepts, and they are easily confused because both generally refer to page tables. Below is relevant knowledge about page table and PTE[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;</description><content:encoded>&lt;p&gt;(For memory basics, refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;Linux Memory Analysis&lt;/a&gt;; this article covers memory knowledge above that foundation)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Basic Concepts
 &lt;div id="memory-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;buddy
 &lt;div id="buddy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buddy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process of buddy system allocating and merging pages is omitted.&lt;/p&gt;
&lt;p&gt;Easily overlooked knowledge points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The prerequisite for buddy merging two blocks of the same size is that their &lt;strong&gt;physical addresses are contiguous&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;The merge algorithm is iterative: after merging at the current level, it will automatically attempt to merge larger blocks. This means compactd is not strictly required for merging&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;page table &amp;amp; PTE
 &lt;div id="page-table--pte" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-table--pte" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;page table and PTE are actually two different concepts, and they are easily confused because both generally refer to page tables. Below is relevant knowledge about page table and PTE[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PTE stores the physical address of the page frame&lt;/li&gt;
&lt;li&gt;&amp;ldquo;page table&amp;rdquo; and &amp;ldquo;Page Table&amp;rdquo; are different concepts: &amp;ldquo;page table&amp;rdquo; refers to the pages that maintain the mapping between linear addresses and physical addresses, while &amp;ldquo;Page Table&amp;rdquo; refers to pages in the upper-level page table&lt;/li&gt;
&lt;li&gt;pte_t, pmd_t, pud_t, pgd_t describe Page Table Entry, Page Middle Directory entry, Page Upper Directory entry, and Page Global Directory entry respectively&lt;/li&gt;
&lt;li&gt;PTE is Page Table Entry&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only look at the size of the pagetable used by the MMU to cache virtual-to-physical memory mappings, confusing pagetable with PTE doesn&amp;rsquo;t make much difference. However, if you need to go deep into page table directories, you need to separate the two concepts.&lt;/p&gt;

&lt;h3 class="relative group"&gt;TLB
 &lt;div id="tlb" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tlb" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each level of the page table is stored in memory. To complete a single virtual-to-physical address translation, all four page tables corresponding to the current virtual address must be found. &lt;strong&gt;This means a single memory IO requires looking up the page table in memory 4 times just for virtual-to-physical address translation&lt;/strong&gt;. Translation Lookaside Buffers (TLB) are caches specifically designed to accelerate virtual-to-physical address translation.&lt;/p&gt;
&lt;p&gt;Regarding the TLB&amp;rsquo;s location, it is usually in the L1 cache (some say it&amp;rsquo;s in registers or L2, which likely depends on the CPU architecture; for now, just consider it as CPU cache, distinct from main memory)&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0a897b5be8a9.png" alt="image.png" /&gt;
In modern processors, the L1 cache is typically divided into multiple parts, including data cache dTLB and instruction cache iTLB. Frequently modifying page tables leads to increased main memory accesses, causing the CPU to frequently flush the TLB cache[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]. The TLB also has a finite size; improving TLB hit rate can reduce accesses to the main memory pagetable. Using huge pages can reduce PTEs by three orders of magnitude, greatly reducing TLB misses.[^ 《深入理解Linux进程和内存》 (Understanding Linux Processes and Memory)].&lt;/p&gt;
&lt;p&gt;TLB information:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#cpuid -l&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; L1 TLB/cache information: 2M/4M pages &amp;amp; L1 TLB &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x80000005/eax&lt;span style="color:#f92672"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; L1 TLB/cache information: 4K pages &amp;amp; L1 TLB &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x80000005/ebx&lt;span style="color:#f92672"&gt;)&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; L2 TLB/cache information: 2M/4M pages &amp;amp; L2 TLB &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x80000006/eax&lt;span style="color:#f92672"&gt;)&lt;/span&gt;:&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Observing TLB hit rate:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;perf stat -e dTLB-loads,dTLB-load-misses,iTLB-loads,iTLB-load-misses -I &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; -p $PM_PID &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;During memory reclamation, TLB misses do increase, but it&amp;rsquo;s hard to establish a causal relationship. TLB miss is just one observation metric for the MMU — TLB is part of MMU.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Reverse Mapping
 &lt;div id="reverse-mapping" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reverse-mapping" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The general principles of PFRA (Page Frame Reclaiming Algorithm)[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;First, release &amp;ldquo;harmless&amp;rdquo; pages. Start by reclaiming harmless pages in the pagecache — pages not occupied by any process&lt;/li&gt;
&lt;li&gt;All pages of user-mode processes are candidates for reclamation. FRPA will gradually deprive user-mode pages with longer sleep times of their page frames&lt;/li&gt;
&lt;li&gt;Cancel the mapping of all page table entries for a shared page frame, then reclaim that shared page frame&lt;/li&gt;
&lt;li&gt;Only reclaim &amp;ldquo;unused&amp;rdquo; pages&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;One of PFRA&amp;rsquo;s goals is to be able to release shared page frames. The process of quickly locating all page table entries pointing to the same page frame is called reverse mapping.&lt;/p&gt;
&lt;p&gt;Reverse mappings for shared&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Anonymous pages&lt;/li&gt;
&lt;li&gt;File-mapping pages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Basic tricks of page frame reclaiming&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LRU lists&lt;/li&gt;
&lt;li&gt;Free cheapest pages first&lt;/li&gt;
&lt;li&gt;Unmap all at once&lt;/li&gt;
&lt;li&gt;Etc&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Huge Pages
 &lt;div id="huge-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#huge-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Enabling huge pages provides certain performance improvements for specific application workloads. In PostgreSQL, enabling huge pages on large-memory instances also offers some performance gains and even some stability benefits.&lt;/p&gt;
&lt;p&gt;Why are huge pages better?&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduced TLB pressure&lt;/li&gt;
&lt;li&gt;Reduced pagetable size in main memory&lt;/li&gt;
&lt;li&gt;Huge pages are physically contiguous. Contiguous physical memory access is better than non-contiguous physical memory access&lt;/li&gt;
&lt;li&gt;When using these kinds of larger pages, higher level pages can directly map them, with no need to use lower level page entries[^ kernel.org,mm pagetables]&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, using huge pages brings management challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Huge pages need to be pre-allocated&lt;/li&gt;
&lt;li&gt;Huge page size must be calculated in advance to avoid memory waste&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Two ways for processes to use huge pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The first is by using &lt;code&gt;shmget()&lt;/code&gt; to setup a shared region backed by huge pages&lt;/li&gt;
&lt;li&gt;the second is the call &lt;code&gt;mmap()&lt;/code&gt; on a file opened in the huge page filesystem&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;C Library and System Calls
 &lt;div id="c-library-and-system-calls" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#c-library-and-system-calls" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The middle layer between kernel space and user space is the system call layer. Application Programming Interfaces (APIs) and system calls are different. Applications call APIs implemented in user space to program, rather than directly executing system calls. In the UNIX world, the most common system call layer is the POSIX standard (Portable Operation System Interface of UNIX). The POSIX standard targets APIs, not system calls. The Linux operating system&amp;rsquo;s API is typically provided in the form of C standard libraries, such as libc. The C standard library provides implementations for most POSIX APIs.[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)]&lt;/p&gt;
&lt;p&gt;C app-&amp;gt;C lib-&amp;gt;system calls-&amp;gt;OS-&amp;gt;hardware&lt;sup id="fnref:4"&gt;&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref"&gt;4&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/72d91b350d7d.png" alt="image.png" /&gt;
Common C library and system calls:&lt;/p&gt;
&lt;p&gt;malloc,free=&amp;gt;C lib&lt;/p&gt;
&lt;p&gt;mmap、brk、munmap=&amp;gt;system calls&lt;/p&gt;

&lt;h3 class="relative group"&gt;Page Fault Exception
 &lt;div id="page-fault-exception" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-fault-exception" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Page fault exceptions (or page fault interrupts) need to distinguish two cases: exceptions caused by programming errors; and physical page allocation behavior triggered by using virtual address space where physical page frames haven&amp;rsquo;t been allocated yet.[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Exceptional page fault: Segment Fault — each virtual memory area has associated permissions. If a process accesses a memory area outside its valid range, or illegally accesses a memory area, or accesses a memory area in an incorrect manner, the processor reports a page fault exception. In severe cases, it reports a &amp;ldquo;Segment Fault&amp;rdquo; and terminates the process[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)].&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Normal page fault: System calls like mmap and brk manage virtual memory; they don&amp;rsquo;t directly allocate physical memory. Virtual memory system call functions only establish the process address space. Virtual memory is visible in user space, but no mapping between virtual memory and physical memory has been established. When a process accesses virtual memory where no mapping has been established, a page fault interrupt is triggered.[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)]&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Page faults are also divided into two types:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;minor fault: the page fault was handled without blocking the current process, and a page frame was allocated&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;major fault: the page fault forced the current process to sleep (likely because filling the page frame with data from disk took time). A page fault that blocks the current process is a major fault[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Copy-On-Write (COW)
 &lt;div id="copy-on-write-cow" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#copy-on-write-cow" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When the fork system call is executed, the child process and parent process have independent process address spaces but share physical memory resources, including process context, process stack, memory information, file descriptors, directories, resource limits, etc. Only the parent process&amp;rsquo;s page table needs to be copied to the child process. At this point, sharing is read-only. When writing is needed (when running their respective tasks), data is copied, giving the parent and child processes their own copies.[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)]&lt;/p&gt;
&lt;p&gt;For PostgreSQL&amp;rsquo;s multi-process model, fork itself isn&amp;rsquo;t heavy — you may only need to worry about page tables — but the various tasks that come after fork will trigger copy-on-write to create the child process&amp;rsquo;s own resource copies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note the distinction between copy-on-write and page fault exceptions: copy-on-write refers to resources not being allocated to the child process at fork time; page fault exceptions refer to physical memory allocation occurring for this process, unrelated to fork.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;mmap, brk &amp;amp; Shared Memory Mapping Area, Heap Area
 &lt;div id="mmap-brk--shared-memory-mapping-area-heap-area" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mmap-brk--shared-memory-mapping-area-heap-area" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The functions and memory address regions used by mmap and brk are different:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;mmap&lt;/code&gt; is used to manage shared memory, corresponding to the shared memory mapping area&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;brk&lt;/code&gt; is used to manage private memory, corresponding to the heap area&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Linear address region functions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;mmap: The mapping area expands top-down. The mmap mapping area and heap expand toward each other until they exhaust the remaining space in the virtual address space. This structure facilitates the C runtime library&amp;rsquo;s use of the mmap mapping area and heap for memory allocation.&lt;/li&gt;
&lt;li&gt;Stack: Stores local variables and function parameters during program execution, grows from high addresses to low addresses&lt;/li&gt;
&lt;li&gt;Heap: Dynamic memory allocation area, managed through functions like malloc, new, free, and delete&lt;/li&gt;
&lt;li&gt;BSS (Uninitialized Variables): Stores uninitialized global variables and static variables&lt;/li&gt;
&lt;li&gt;Data: Stores global variables and static variables with predefined values in source code&lt;/li&gt;
&lt;li&gt;Text (Code): Stores read-only program execution code, i.e., machine instructions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Shared memory mapping area and heap area&lt;sup id="fnref:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/974eb641977f.png" alt="image.png" /&gt;
Real postmaster heap and shared memory mapping:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/1063005/smaps |grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;\-s|heap&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;022a4000-022ee000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fef6019e000-7fef601a5000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:17 &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; /dev/shm/PostgreSQL.1291978332
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fef601a5000-7fef6098b000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:01 &lt;span style="color:#ae81ff"&gt;1052&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#75715e"&gt;#this is shared buffers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fef6e238000-7fef6e239000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:01 &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; /SYSV0011f702 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can see the heap and shared memory area addresses roughly match.&lt;/p&gt;

&lt;h2 class="relative group"&gt;VM
 &lt;div id="vm" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vm" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Linux kernel virtual memory subsystem&lt;/p&gt;
&lt;p&gt;Directory: &lt;code&gt;cd /proc/sys/vm/&lt;/code&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;compact
 &lt;div id="compact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#compact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;concept &amp;amp; param
 &lt;div id="concept--param" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept--param" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Memory compaction is a mechanism in the Linux kernel for solving memory fragmentation problems. It improves the allocation and compaction efficiency of large contiguous memory pages by merging free physical pages.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Function&lt;/th&gt;
 &lt;th&gt;Default/Range&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;compact_memory&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Manually trigger a global memory compaction operation&lt;/td&gt;
 &lt;td&gt;Write 1 to trigger&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;compaction_proactiveness&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls the frequency of proactive compaction&lt;/td&gt;
 &lt;td&gt;Parameter available since 4.x. 0-100 (default 20)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;compact_unevictable_allowed&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Whether to allow compaction of unreclaimable pages (e.g., &lt;code&gt;mlock&lt;/code&gt; locked memory)&lt;/td&gt;
 &lt;td&gt;Parameter available since 4.x. 0 (disable) or 1 (allow)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;defrag_mode&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls the trigger strategy for memory defragmentation&lt;/td&gt;
 &lt;td&gt;Parameter available since 4.x. 0-3. 0 disables automatic compaction; 1 defers passive compaction. Default in 3.10 is 1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;extfrag_threshold&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Threshold for triggering compaction when large memory blocks are insufficient&lt;/td&gt;
 &lt;td&gt;0-1000 (default 500)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;There are 3 compaction modes (depending on kernel version support):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Passive compaction: &lt;code&gt;extfrag_threshold&lt;/code&gt; addresses &amp;ldquo;already occurred&amp;rdquo; fragmentation problems — triggered when a process requests large memory blocks and finds them insufficient.&lt;/li&gt;
&lt;li&gt;Proactive compaction: &lt;code&gt;compaction_proactiveness&lt;/code&gt; proactively controls compaction aggressiveness, optimizing &amp;ldquo;not yet occurred&amp;rdquo; but potential fragmentation risks.&lt;/li&gt;
&lt;li&gt;Manual compaction: &lt;code&gt;compact_memory&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;extfrag_threshold&lt;/code&gt; is the Linux kernel parameter controlling passive compaction. When the kernel fails to allocate high-order contiguous physical memory (e.g., huge pages), it determines the failure cause via the fragmentation index:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-1&lt;/code&gt;: Allocation succeeded (watermark satisfied)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;0&lt;/code&gt;: Failed due to insufficient memory&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1000&lt;/code&gt;: Failed due to fragmentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;View specific values via &lt;code&gt;/sys/kernel/debug/extfrag/extfrag_index&lt;/code&gt;. The output is a floating-point number (e.g., &lt;code&gt;0.854&lt;/code&gt;), but the actual range is magnified 1000x, so &lt;code&gt;0.854&lt;/code&gt; corresponds to an actual value of 854:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /sys/kernel/debug/extfrag/extfrag_index |grep Normal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.995 0.998 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If extfrag_threshold=600, compaction is triggered when the fragmentation index &amp;gt; 600. extfrag_index is quite useful and can assist buddy in observing fragmentation issues.&lt;/p&gt;

&lt;h3 class="relative group"&gt;dirty
 &lt;div id="dirty" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#dirty" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;concept &amp;amp; param
 &lt;div id="concept--param-1" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept--param-1" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Dirty page flushing is somewhat similar to memory reclamation and is also divided into asynchronous and synchronous:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Asynchronous flushing: performed by background threads like pdflush/flush/kdmflush; application writes are not affected&lt;/li&gt;
&lt;li&gt;Synchronous flushing: directly blocks the application process; the process that initiated the write operation flushes the dirty pages itself&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_background_bytes&lt;/td&gt;
 &lt;td&gt;Background async flush threshold, in bytes&lt;/td&gt;
 &lt;td&gt;0 (disabled)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_background_ratio&lt;/td&gt;
 &lt;td&gt;Background async flush threshold, as percentage&lt;/td&gt;
 &lt;td&gt;10%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_bytes&lt;/td&gt;
 &lt;td&gt;Synchronous flush threshold, in bytes&lt;/td&gt;
 &lt;td&gt;0 (disabled)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_ratio&lt;/td&gt;
 &lt;td&gt;Synchronous flush threshold, as percentage&lt;/td&gt;
 &lt;td&gt;20-40%&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_expire_centisecs&lt;/td&gt;
 &lt;td&gt;Maximum lifetime of dirty pages in memory&lt;/td&gt;
 &lt;td&gt;3000 (30s)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;dirty_writeback_centisecs&lt;/td&gt;
 &lt;td&gt;Frequency of kernel periodic dirty page state checks&lt;/td&gt;
 &lt;td&gt;500 (5s)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;xxx_bytes and xxx_ratio parameters are mutually exclusive.&lt;/p&gt;
&lt;p&gt;Example parameters and flowchart:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_background_bytes &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_background_ratio &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_bytes &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_ratio &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_expire_centisecs &lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dirty_writeback_centisecs &lt;span style="color:#ae81ff"&gt;500&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-mermaid" data-lang="mermaid"&gt;%% Dirty page flushing flow diagram integrating time parameters
graph TD
 A[App writes generate dirty pages] --&amp;gt; B{Check interval reached?&amp;lt;br&amp;gt;dirty_writeback_centisecs every 5s}
 B -- No --&amp;gt; D{Expired dirty pages exist?&amp;lt;br&amp;gt; dirty_expire_centisecs&amp;gt;30s}
 B -- Yes --&amp;gt; C{Dirty page threshold check}
 C --&amp;gt; E[Dirty page ratio? dirty_background_ratio&amp;gt;10% ]
 C --&amp;gt; F[Dirty page ratio? dirty_ratio&amp;gt; 40%]
 E -- Trigger --&amp;gt; G[Background async flush]
 F -- Trigger --&amp;gt; H[Synchronous flush]
 D -- Trigger --&amp;gt; G
 G --&amp;gt; I[Dirty pages written to disk]
 H --&amp;gt; I[Dirty pages written to disk] 
 I --&amp;gt; J[Free memory space]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The configuration principles for dirty page flush parameters are basically the same as PostgreSQL dirty page flush parameters. Setting them too low causes overly frequent flushing — the same dirty page may be written to disk multiple times, wasting IO. Setting them too high may cause IO storms.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Observing Dirty Pages
 &lt;div id="observing-dirty-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observing-dirty-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Monitoring dirty pages:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ps -eo pid,%cpu,%mem,wchan,args,state|grep kdmflush|grep -E -w -v &lt;span style="color:#e6db74"&gt;&amp;#34;S&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Observe async flush process state&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/vmstat| grep -E -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_dirty|nr_writeback&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#vmstat dirty, page count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -i dirty &lt;span style="color:#75715e"&gt;#meminfo dirty, KB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Testing dirty pages with dd:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;nr_dirty_threshold|nr_dirty_background_threshold&amp;#34;&lt;/span&gt; /proc/vmstat | awk &lt;span style="color:#e6db74"&gt;&amp;#39;{printf &amp;#34;%s: %.2fGB\n&amp;#34;, $1, ($2*4)/1048576}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nr_dirty_threshold: 141.28GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nr_dirty_background_threshold: 35.32GB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dd &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/dev/zero of&lt;span style="color:#f92672"&gt;=&lt;/span&gt;testfile bs&lt;span style="color:#f92672"&gt;=&lt;/span&gt;8k count&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;128000&lt;/span&gt; &lt;span style="color:#75715e"&gt;# cache io &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Failed test (same result after multiple tests):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No RUNNING kdmflush process observed&lt;/li&gt;
&lt;li&gt;Dirty pages were flushed before reaching 35GB or 30S threshold&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Timestamp&lt;/th&gt;
 &lt;th&gt;nr_dirty&lt;/th&gt;
 &lt;th&gt;nr_dirty(GB)&lt;/th&gt;
 &lt;th&gt;Trend Simulation&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;17:00:18&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;2,757&lt;/td&gt;
 &lt;td&gt;0.01052&lt;/td&gt;
 &lt;td&gt;▍&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:19&lt;/td&gt;
 &lt;td&gt;336,199&lt;/td&gt;
 &lt;td&gt;1.282&lt;/td&gt;
 &lt;td&gt;████▌&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:25&lt;/td&gt;
 &lt;td&gt;1,984,867&lt;/td&gt;
 &lt;td&gt;7.574&lt;/td&gt;
 &lt;td&gt;██████████████▍&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;17:00:32&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;4,252,177&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;16.22&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;████████████████████&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:33&lt;/td&gt;
 &lt;td&gt;3,699,227&lt;/td&gt;
 &lt;td&gt;14.11&lt;/td&gt;
 &lt;td&gt;█████████████████▊&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:38&lt;/td&gt;
 &lt;td&gt;170,865&lt;/td&gt;
 &lt;td&gt;0.652&lt;/td&gt;
 &lt;td&gt;▎&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:46&lt;/td&gt;
 &lt;td&gt;2,865,814&lt;/td&gt;
 &lt;td&gt;10.93&lt;/td&gt;
 &lt;td&gt;█████████▋&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;17:00:54&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;4,721,827&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;18.01&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;██████████████████████&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:00:55&lt;/td&gt;
 &lt;td&gt;3,876,509&lt;/td&gt;
 &lt;td&gt;14.79&lt;/td&gt;
 &lt;td&gt;██████████████████&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;17:01:03&lt;/td&gt;
 &lt;td&gt;835,097&lt;/td&gt;
 &lt;td&gt;3.186&lt;/td&gt;
 &lt;td&gt;██▊&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;os dirty != pg dirty
 &lt;div id="os-dirty--pg-dirty" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#os-dirty--pg-dirty" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;With pg fsync=on, data writes go through the OS pagecache before specific blocks are written to disk. PostgreSQL has its own dirty pages, and the OS also has dirty pages. What&amp;rsquo;s the relationship between the two?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Observation commands&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -E -w &lt;span style="color:#e6db74"&gt;&amp;#34;Dirty&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;# OS dirty pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; isdirty,pinning_backends,count&lt;span style="color:#f92672"&gt;(&lt;/span&gt;*&lt;span style="color:#f92672"&gt;)&lt;/span&gt; from pg_buffercache where isdirty is true group by isdirty,pinning_backends; &lt;span style="color:#75715e"&gt;# PG dirty pages&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000000&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Observe&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Test results:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;stage&lt;/th&gt;
 &lt;th&gt;dirty in pg&lt;/th&gt;
 &lt;th&gt;OS dirty&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Clean state&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;0.02-2M fluctuating&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;After insert completion&lt;/td&gt;
 &lt;td&gt;200M&lt;/td&gt;
 &lt;td&gt;Rose to 1.7G, then dropped to 20KB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;After commit&lt;/td&gt;
 &lt;td&gt;200M&lt;/td&gt;
 &lt;td&gt;0.02-2M fluctuating&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;After checkpoint flush&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;0.02-2M fluctuating&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;When the insert data size is increased, OS dirty rises during insert, rising to the GB level and then fluctuating.&lt;/p&gt;
&lt;p&gt;PG dirty has some relation to OS dirty but they&amp;rsquo;re not entirely correlated. When PG inserts data, OS dirty does rise, but after the OS flushes its own dirty pages, PG&amp;rsquo;s dirty pages remain dirty. Preliminary judgment: dirty pages in shared memory are unrelated to OS dirty. It&amp;rsquo;s yet to be determined whether the OS dirty increase comes from PG&amp;rsquo;s private memory dirty pages.&lt;/p&gt;

&lt;h3 class="relative group"&gt;swappiness
 &lt;div id="swappiness" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#swappiness" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Controls the kernel&amp;rsquo;s bias toward reclaiming memory from the anonymous memory pool or the page cache. Essentially, it controls whether swapping anonymous pages or reclaiming file pages imposes a lower cost for the upper-layer application. For example, for compute-oriented applications using more dynamic allocation or private memory, a lower swappiness should be set; for data-dependent applications, a higher swappiness should be set to reduce the impact of flushing file pages on data access. However, all of this depends on the efficiency of swap IO and filesystem IO&lt;sup id="fnref:6"&gt;&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref"&gt;6&lt;/a&gt;&lt;/sup&gt;. It all sounds ideal, but when swapping occurs, it very likely means performance degradation.&lt;/p&gt;

&lt;h4 class="relative group"&gt;swappiness=0
 &lt;div id="swappiness0" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#swappiness0" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When &lt;code&gt;swappiness=0&lt;/code&gt;, the kernel will only swap when memory reaches the high watermark&lt;sup id="fnref:7"&gt;&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref"&gt;7&lt;/a&gt;&lt;/sup&gt;. The specific strategy also relates to the kernel version and NUMA. What can be confirmed is that &lt;code&gt;swappiness=0&lt;/code&gt; does not mean swap is disabled — &lt;code&gt;swapoff -a&lt;/code&gt; is what disables the swap functionality.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Check if swap is enabled&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;swapon --show
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;free -h |grep Swap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep -E &lt;span style="color:#e6db74"&gt;&amp;#39;swap|none&amp;#39;&lt;/span&gt; /etc/fstab
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo|grep Swap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Monitor whether swapping is occurring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/vmstat|grep swp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar -W &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;inconsistent swap behavior
 &lt;div id="inconsistent-swap-behavior" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inconsistent-swap-behavior" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The OS-level /proc/sys/vm/swappiness has little-to-no effect on the swap behavior of cgroups v1 systems (has little-to-no effect on the swap). This issue can lead to inconsistent swap behavior&lt;sup id="fnref:8"&gt;&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref"&gt;8&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Occurrence conditions (all must be true):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;vm.swappiness != cgroups memory.swappiness&lt;/li&gt;
&lt;li&gt;cgroups v1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cause:&lt;/p&gt;
&lt;p&gt;systemd creates cgroups early during startup, before &lt;code&gt;sysctl.service&lt;/code&gt; loads &lt;code&gt;/etc/sysctl.conf&lt;/code&gt;. vm.swappiness cannot constrain cgroup memory.swappiness. The issue is: when the OS swap behavior and cgroup behavior differ, which one takes effect?&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;for cgroup v1, set vm.swappiness = all cgroups memory.swappiness&lt;/li&gt;
&lt;li&gt;for cgroup v1, many solutions available, see &lt;a href="https://access.redhat.com/solutions/6785021" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/solutions/6785021&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Use cgroup v2. v2 adds the vm.force_cgroup_v2_swappiness parameter, which disables cgroup&amp;rsquo;s memory.swappiness&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;memory overcommitment
 &lt;div id="memory-overcommitment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-overcommitment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;concept &amp;amp; param
 &lt;div id="concept--param-2" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept--param-2" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Linux does not reserve physical memory for every virtual address; instead, it allocates memory only when actually needed. Overcommitment can limit the total virtual memory size that all processes can request. When the requested memory exceeds the defined physical memory size, it&amp;rsquo;s called overcommit.&lt;/p&gt;
&lt;p&gt;There are three overcommit policy parameters: &lt;code&gt;overcommit_memory&lt;/code&gt;, &lt;code&gt;overcommit_ratio&lt;/code&gt;/&lt;code&gt;overcommit_kbytes&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;overcommit_memory&lt;/code&gt; parameter controls the overcommitment policy:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;0&lt;/code&gt; (default): Heuristic overcommitment policy, allows slight overcommit. CommitLimit = physical memory + swap.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;1&lt;/code&gt;: No overcommit check&lt;/li&gt;
&lt;li&gt;&lt;code&gt;2&lt;/code&gt;: Strict limit, prohibits exceeding &lt;code&gt;CommitLimit&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-mermaid" data-lang="mermaid"&gt;graph TD
 A[Memory allocation request] --&amp;gt; B{Overcommit mode}
 B --&amp;gt;|Mode 0: Heuristic| C[&amp;#34;Allow moderate virtual memory overcommit&amp;#34;]
 B --&amp;gt;|Mode 1: Unlimited| D[&amp;#34;Virtual memory commits unconstrained&amp;#34;]
 B --&amp;gt;|Mode 2: Strict| E[&amp;#34;Virtual memory total ≤ CommitLimit&amp;#34;]
 C --&amp;gt; F[Allocate physical pages on demand at runtime]
 D --&amp;gt; G[May exhaust physical memory + Swap]
 E --&amp;gt; H[Enforce virtual memory total control]&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;overcommit_memory=2&lt;/code&gt;, only one of the &lt;code&gt;overcommit_ratio&lt;/code&gt; and &lt;code&gt;overcommit_kbytes&lt;/code&gt; parameters takes effect. The &lt;code&gt;CommitLimit&lt;/code&gt; is calculated as follows:
$$
CommitLimit = (RAM - huge page memory) × \frac{overcommit_ratio}{100} + SwapTotal
$$
or
$$
CommitLimit = (RAM - huge page memory) + overcommit_kbytes + SwapTotal
$$
Interesting overcommit accounting&lt;sup id="fnref:9"&gt;&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref"&gt;9&lt;/a&gt;&lt;/sup&gt; — mmap, brk, fork are all accounted for, which clearly affects PostgreSQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Status
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account mmap memory mappings
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account mprotect changes in commit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account mremap changes in size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account brk
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We account munmap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	We report the commit status in /proc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	Account and check on fork
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	Review stack handling/building on exec
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	SHMfs accounting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;o	Implement actual limit enforcement&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Reserve Memory and Overcommit
 &lt;div id="reserve-memory-and-overcommit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reserve-memory-and-overcommit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;user_reserve_kbytes&lt;/code&gt;: When overcommit_memory=2, physical memory reserved for ordinary user processes. When system memory is severely insufficient, it ensures ordinary users can still perform basic operations (like starting new processes, handling memory allocation requests). Default is min(3% of the current process size, 128M). When set to 0, a single process can allocate (all free memory - admin_reserve_kbytes)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;admin_reserve_kbytes&lt;/code&gt;: Physical memory reserved for users with &lt;code&gt;CAP_SYS_ADMIN&lt;/code&gt; privileges (typically root user), ensuring admin recovery capability — reserved physical memory ensuring the system administrator can log in and execute commands. Default is min(3% memory, 8MB). When using strict overcommit mode, it&amp;rsquo;s best to increase this parameter.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat user_reserve_kbytes 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;131072&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat admin_reserve_kbytes 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Observing Overcommit
 &lt;div id="observing-overcommit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observing-overcommit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep -E &lt;span style="color:#e6db74"&gt;&amp;#39;CommitLimit|Committed_AS&amp;#39;&lt;/span&gt; /proc/meminfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar -r &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ grep -E &lt;span style="color:#e6db74"&gt;&amp;#39;CommitLimit|Committed_AS&amp;#39;&lt;/span&gt; /proc/meminfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CommitLimit: &lt;span style="color:#ae81ff"&gt;203103492&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Committed_AS: &lt;span style="color:#ae81ff"&gt;252170700&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ sar -r &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;07:32:35 PM kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;07:32:37 PM &lt;span style="color:#ae81ff"&gt;25472180&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;370249056&lt;/span&gt; 93.56 &lt;span style="color:#ae81ff"&gt;14588&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;274485956&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;252242936&lt;/span&gt; 62.91 &lt;span style="color:#ae81ff"&gt;233866528&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;103568816&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12924&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;07:32:38 PM &lt;span style="color:#ae81ff"&gt;25471904&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;370249332&lt;/span&gt; 93.56 &lt;span style="color:#ae81ff"&gt;14588&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;274487888&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;252242740&lt;/span&gt; 62.91 &lt;span style="color:#ae81ff"&gt;233851748&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;103570136&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11180&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Metric meanings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;meminfo CommitLimit: CommitLimit calculated from physical memory, Swap, and overcommit parameters&lt;/li&gt;
&lt;li&gt;meminfo Committed_AS: Total virtual memory currently requested by all processes&lt;/li&gt;
&lt;li&gt;sar -r kbcommit = Committed_AS&lt;/li&gt;
&lt;li&gt;sar -r %commit = kbcommit / total physical memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;smaps or status can also show total requested virtual memory, but directly summing smaps/status total virtual memory double-counts shared library files and mapped files (like mmap), while &lt;code&gt;Committed_AS&lt;/code&gt; only counts memory requested via mmap, brk, fork, etc., and does not double-count shared memory. The two have different calculation scopes. For total virtual memory, just look at Committed_AS or kbcommit.&lt;/p&gt;

&lt;h3 class="relative group"&gt;watermark
 &lt;div id="watermark" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#watermark" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Introduced&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Unit/Range&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;min_free_kbytes&lt;/td&gt;
 &lt;td&gt;Defines the minimum free memory the system reserves, directly affecting the watermarks &lt;code&gt;watermark[min]&lt;/code&gt; calculation, ensuring the system retains enough memory for critical operations when memory is tight&lt;/td&gt;
 &lt;td&gt;Early kernel versions&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;KB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;watermark_scale_factor&lt;/td&gt;
 &lt;td&gt;Globally adjusts the memory watermark gap (&lt;code&gt;high-low&lt;/code&gt; and &lt;code&gt;low-min&lt;/code&gt;)&lt;/td&gt;
 &lt;td&gt;Linux kernel 4.x (exact minor version unknown)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;10&lt;/code&gt; (0.1% physical memory)&lt;/td&gt;
 &lt;td&gt;Max &lt;code&gt;3000&lt;/code&gt; (30% physical memory)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;watermark_boost_factor&lt;/td&gt;
 &lt;td&gt;Temporarily raises the high watermark (&lt;code&gt;high&lt;/code&gt;), triggering aggressive memory reclamation to reduce fragmentation&lt;/td&gt;
 &lt;td&gt;Linux kernel 4.x (exact minor version unknown)&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;15000&lt;/code&gt; (i.e., 1.5x original high watermark)&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;min_free_kbytes
 &lt;div id="min_free_kbytes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#min_free_kbytes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Calculate total min and other values from zoneinfo&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/zoneinfo | grep -E -w &lt;span style="color:#e6db74"&gt;&amp;#34;min|low|high&amp;#34;&lt;/span&gt;|grep -E -v &lt;span style="color:#e6db74"&gt;&amp;#34;high:&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/min/ { total_min += $2 }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/low/ { total_low += $2 }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/high/ { total_high += $2 }
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;END {
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; printf &amp;#34;Total min: %d KB\nTotal low: %d KB\nTotal high: %d KB\n&amp;#34;,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; total_min * 4, total_low * 4, total_high * 4;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Total min: &lt;span style="color:#ae81ff"&gt;15828844&lt;/span&gt; KB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Total low: &lt;span style="color:#ae81ff"&gt;19786048&lt;/span&gt; KB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Total high: &lt;span style="color:#ae81ff"&gt;23743260&lt;/span&gt; KB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Current system min value&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat min_free_kbytes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15828849&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because there are other zones, the total min across all zones is approximately equal to min_free_kbytes. The Normal zone&amp;rsquo;s min is definitely slightly smaller than min_free_kbytes; you only need to focus on the Normal zone:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Normal zone min, low, high settings; page=4k&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/zoneinfo | grep -A &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; Normal | grep -E &lt;span style="color:#e6db74"&gt;&amp;#34;min|low|high&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; min &lt;span style="color:#ae81ff"&gt;3931615&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; low &lt;span style="color:#ae81ff"&gt;4914518&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; high &lt;span style="color:#ae81ff"&gt;5897422&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before Linux kernel 4.6, min, low, and high had a fixed ratio, and you could only change low and high values by setting min_free_kbytes. &lt;strong&gt;min:low:high = 1:1.25:1.5&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Problems with the fixed ratio:&lt;/p&gt;
&lt;p&gt;Ideally, you&amp;rsquo;d want to raise low to more proactively trigger kswapd async reclamation and lower min to reduce direct reclaim. Before 4.6, you could only indirectly adjust low/high by adjusting min, using min to adjust kswapd&amp;rsquo;s delta working buffer. For example:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;kswapd async reclamation working buffer (low-min)&lt;/th&gt;
 &lt;th&gt;kswapd async reclamation workload (high-low)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;min=1GB, low=1.25GB, high=1.5GB&lt;/td&gt;
 &lt;td&gt;0.25GB&lt;/td&gt;
 &lt;td&gt;0.25GB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;min=10GB, low=12.5GB, high=15GB&lt;/td&gt;
 &lt;td&gt;2.5GB&lt;/td&gt;
 &lt;td&gt;2.5GB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Raising min is done to raise low and high.&lt;/p&gt;
&lt;p&gt;An excessively low min value causes kswapd to not have time to asynchronously reclaim more memory before direct reclaim triggers. An excessively high min not only wastes memory but also causes more frequent reclamation activity, resulting in higher sys CPU usage. The default difference between low and min in Linux indeed seems a bit small.&lt;/p&gt;

&lt;h4 class="relative group"&gt;watermark_scale_factor
 &lt;div id="watermark_scale_factor" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#watermark_scale_factor" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Wouldn&amp;rsquo;t it be great if you could directly adjust min, low, and high? Sorry, the Linux kernel doesn&amp;rsquo;t support that (Android has extra_free_kbytes). But&amp;hellip;&lt;/p&gt;
&lt;p&gt;Since Linux kernel 4.x, the watermark_scale_factor parameter was added, allowing adjustment of the ratios between parameters — the ratio is no longer fixed. Its default value is 10, corresponding to 0.1% of memory (10/10000), with a maximum of 3000. When set to 1000, it means the difference between &amp;ldquo;low&amp;rdquo; and &amp;ldquo;min&amp;rdquo;, and between &amp;ldquo;high&amp;rdquo; and &amp;ldquo;low&amp;rdquo;, will both be 10% of memory size (1000/10000).&lt;/p&gt;
&lt;p&gt;0.1% is clearly too small — for 1TB of memory, the scale is only 1GB.&lt;/p&gt;

&lt;h4 class="relative group"&gt;watermark_boost_factor
 &lt;div id="watermark_boost_factor" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#watermark_boost_factor" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;watermark_boost_factor is used to optimize external memory fragmentation. It temporarily raises the zone&amp;rsquo;s watermark, i.e., zone-&amp;gt;watermark_boost, thereby raising the zone&amp;rsquo;s high watermark (WMARK_HIGH). This allows kswapd to reclaim more memory, making it easier for the memory compaction module (compactd kernel thread) to merge large blocks of contiguous physical memory. The default value of watermark_boost_factor is 15000, meaning the original high watermark is temporarily raised to 150%. Setting this to 0 disables the mechanism for temporarily raising zone watermarks&lt;sup id="fnref:10"&gt;&lt;a href="#fn:10" class="footnote-ref" role="doc-noteref"&gt;10&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;oom
 &lt;div id="oom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The OOM Killer is a kernel module, not a process.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;panic_on_oom&lt;/td&gt;
 &lt;td&gt;Controls system behavior when OOM occurs: &lt;strong&gt;0: Don&amp;rsquo;t trigger panic, start OOM Killer&lt;/strong&gt; 1: Trigger panic and halt 2: Trigger panic then attempt memory release&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_kill_allocating_task&lt;/td&gt;
 &lt;td&gt;Whether to preferentially kill the process that triggered OOM (rather than traversing the process tree to select the optimal target): 0: Disabled 1: Enabled&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_dump_tasks&lt;/td&gt;
 &lt;td&gt;Whether to dump all task information when OOM occurs (for post-mortem analysis): 0: Disabled 1: Enabled&lt;/td&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;oom_score
 &lt;div id="oom_score" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oom_score" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When OOM occurs, the system needs to decide which process to kill based on the OOM score. Each user process has 3 OOM score interface files:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r--r-- 1 postgres postgres 0 May 24 16:39 /proc/63766/oom_adj
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r--r--r-- 1 postgres postgres 0 May 24 16:39 /proc/63766/oom_score
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r--r-- 1 postgres postgres 0 May 24 16:39 /proc/63766/oom_score_adj&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;oom_score is a dynamically calculated OOM score by the system, influenced at least by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Many child processes: +points&lt;/li&gt;
&lt;li&gt;Long-running: -points&lt;/li&gt;
&lt;li&gt;Low nice value: +points (nice value represents process CPU time slice priority. Lower nice values mean higher priority, more CPU time slice allocation)&lt;/li&gt;
&lt;li&gt;Direct hardware access: -points&lt;sup id="fnref:11"&gt;&lt;a href="#fn:11" class="footnote-ref" role="doc-noteref"&gt;11&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition to the Linux-calculated OOM score, adjustments (adj) can be manually applied. oom_adj is from earlier Linux kernel versions; it&amp;rsquo;s best to adjust OOM scores through the oom_score_adj interface file.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter/File&lt;/th&gt;
 &lt;th&gt;Purpose&lt;/th&gt;
 &lt;th&gt;Example Values&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_score&lt;/td&gt;
 &lt;td&gt;Kernel-calculated raw score (dynamic)&lt;/td&gt;
 &lt;td&gt;0~1000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_score_adj&lt;/td&gt;
 &lt;td&gt;User-defined adjustment value, directly affects final score&lt;/td&gt;
 &lt;td&gt;-1000~1000; -1000 equivalent to disabling OOM&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;oom_adj (legacy)&lt;/td&gt;
 &lt;td&gt;Legacy adjustment parameter, range -17~15&lt;/td&gt;
 &lt;td&gt;-17~15&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;lowmem_reserve_ratio
 &lt;div id="lowmem_reserve_ratio" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lowmem_reserve_ratio" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Besides &lt;code&gt;min_free_kbytes&lt;/code&gt;, there&amp;rsquo;s another minimum memory reserve parameter that can cause process memory allocation failures, but their functions differ significantly.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lowmem_reserve_ratio&lt;/code&gt; is a key kernel parameter used to protect low-end memory (DMA, DMA32) from being excessively consumed by high-end memory allocation requests. lowmem_reserve_ratio is just a coefficient, not a directly usable number; the kernel calculates the reserved page count for each zone.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Default values below&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/sys/vm/lowmem_reserve_ratio 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;256&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;256&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Memory zones are ordered by priority from low to high: DMA → DMA32 → Normal → HighMem. Allocation requests from higher-priority zones can &amp;ldquo;borrow&amp;rdquo; memory from lower-priority zones, but must reserve a certain proportion of memory for use by the lower-priority zones.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/zoneinfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;Node 0|protection|free&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pages free &lt;span style="color:#ae81ff"&gt;3976&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; protection: &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0, 2484, 386430, 386430&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pages free &lt;span style="color:#ae81ff"&gt;415741&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; protection: &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0, 0, 383946, 383946&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pages free &lt;span style="color:#ae81ff"&gt;5658528&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; protection: &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0, 0, 0, 0&lt;span style="color:#f92672"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For example, DMA&amp;rsquo;s protection indicates:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;0: Allocation from this zone, no cross-zone allocation restrictions&lt;/li&gt;
&lt;li&gt;2484: Pages DMA reserves for DMA32 zone allocations&lt;/li&gt;
&lt;li&gt;386430: Pages DMA reserves for Normal zone allocations&lt;/li&gt;
&lt;li&gt;386430: Reserved extension field, meaningless in this context&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Based on these settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When DMA32 zone requests memory from DMA zone, 3976 &amp;gt; 2484, it may succeed&lt;/li&gt;
&lt;li&gt;When Normal zone requests memory from DMA zone, 3976 &amp;lt; 386430, it will not succeed&lt;/li&gt;
&lt;li&gt;Requests from lower zones to higher zones are not subject to this restriction&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;misc
 &lt;div id="misc" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#misc" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A few more related parameters; those with less relevance are not listed:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Purpose&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;nr_hugepages&lt;/td&gt;
 &lt;td&gt;Number of huge pages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;nr_overcommit_hugepages&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;Overcommit of huge pages; The maximum is nr_hugepages + nr_overcommit_hugepages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;nr_hugepages_mempolicy&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;NUMA-localized huge page allocation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;hugetlb_shm_group&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;Shared memory permission control&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;del&gt;hugetlb_optimize_vmemmap&lt;/del&gt;&lt;/td&gt;
 &lt;td&gt;Restructure huge page metadata management model, reducing memory usage of huge page metadata (struct page). Supported since Linux kernel 5.13&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;max_map_count&lt;/td&gt;
 &lt;td&gt;Limits the maximum number of memory mapping regions (VMA) a single process can have, default 65530&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zone_reclaim_mode&lt;/td&gt;
 &lt;td&gt;Memory reclamation policy under NUMA, e.g., allocating memory from other nodes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;stat_interval&lt;/td&gt;
 &lt;td&gt;VM stat refresh frequency, default 1 second&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;vfs_cache_pressure&lt;/td&gt;
 &lt;td&gt;Parameter for VFS (Virtual File System) cache reclamation pressure, mainly affecting the aggressiveness of kernel reclaiming dentry and inode caches&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;page-cluster&lt;/td&gt;
 &lt;td&gt;Swap readahead, swaps multiple pages to swap partition at once. Default 3, i.e., 8 pages at once&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h2 class="relative group"&gt;OS Memory Observation and Calculation
 &lt;div id="os-memory-observation-and-calculation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#os-memory-observation-and-calculation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;/proc/meminfo, /proc/vmstat, /proc/zoneinfo all contain memory information, much of it duplicative. I won&amp;rsquo;t list the differences — a glance tells you what&amp;rsquo;s what.&lt;/p&gt;

&lt;h3 class="relative group"&gt;free available Calculation (Unfinished)
 &lt;div id="free-available-calculation-unfinished" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#free-available-calculation-unfinished" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;General direction: (NR_FREE_PAGES + NR_FILE_PAGES - NR_SHMEM + NR_SWAP_PAGES + NR_SLBA_RECLAIMABLE - TOTALRESERVE_PAGES - root reserved memory)&lt;/p&gt;
&lt;p&gt;The kernel has its own estimated available memory. Directly calculating the available value using a formula is difficult to get exactly right:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Not very accurate, don&amp;#39;t use&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;MemFree|Active\(file\)|Inactive\(file\)|SwapFree|SReclaimable|nr_shmem|Shmem&amp;#34;&lt;/span&gt; |awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2} NR==5 {e=$2} NR==6 {f=$2 ;print (a+b+c+d-e+f)}&amp;#39;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;MemAvailable&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;inactive_anon + active_anon != anon
 &lt;div id="inactive_anon--active_anon--anon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inactive_anon--active_anon--anon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Why?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Primary: Shmem separately counts shared memory pages. nr_anon_pages does not include shared memory pages, while nr_inactive_anon and nr_active_anon include anonymous shared memory pages&lt;/li&gt;
&lt;li&gt;Secondary: anon includes some Unevictable pages (Mlocked is a subset of Unevictable)&lt;/li&gt;
&lt;li&gt;Other minor statistical differences have little impact&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A rough but relatively accurate formula: nr_inactive_anon + nr_active_anon + nr_unevictable - nr_shmem&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Applicable under huge pages; not applicable under NUMA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## /proc/meminfo, /proc/zoneinfo, /proc/vmstat can all be used for calculation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#/proc/vmstat&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_computed : &amp;#34;&lt;/span&gt;;cat /proc/vmstat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_inactive_anon|nr_active_anon|nr_unevictable|nr_shmem&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2; print (a+b+c-d)}&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_real : &amp;#34;&lt;/span&gt;;cat /proc/vmstat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_anon_pages&amp;#34;&lt;/span&gt;|awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_computed : &lt;span style="color:#ae81ff"&gt;15776924&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_real : &lt;span style="color:#ae81ff"&gt;15772671&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;##/proc/zoneinfo Normal&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_normal_computed : &amp;#34;&lt;/span&gt;; cat /proc/zoneinfo |grep Normal -A 50|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_inactive_anon|nr_active_anon|nr_unevictable|nr_shmem&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2; print (a+b+c-d)}&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_normal_real : &amp;#34;&lt;/span&gt;; cat /proc/zoneinfo |grep Normal -A 50|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;nr_anon_pages&amp;#34;&lt;/span&gt;|awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_normal_computed : &lt;span style="color:#ae81ff"&gt;15711170&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_normal_real : &lt;span style="color:#ae81ff"&gt;15707402&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;cache Calculation
 &lt;div id="cache-calculation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cache-calculation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The buff/cache shown in the free command can be calculated from file pages or cache itself:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;filepage+shmem: &amp;#34;&lt;/span&gt;;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;Buffers|Active\(file\)|Inactive\(file\)|Shmem|SReclaimable&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2} NR==4 {d=$2} NR==5 {e=$2 ;print (a+b+c+d+e)}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;cached: &amp;#34;&lt;/span&gt;;cat /proc/meminfo |grep -Ew &lt;span style="color:#e6db74"&gt;&amp;#34;Buffers|Cached|SReclaimable&amp;#34;&lt;/span&gt; | awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2 ;print (a+b+c)}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;free -k;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Execution results:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;filepage+shmem: &lt;span style="color:#ae81ff"&gt;289417584&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cached: &lt;span style="color:#ae81ff"&gt;289419156&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total used free shared buff/cache available
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mem: &lt;span style="color:#ae81ff"&gt;395721236&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;79633516&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26668564&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84704912&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;289419156&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;178501152&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;5242876&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5242876&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Controversy: Does shmem Count as cache?
 &lt;div id="controversy-does-shmem-count-as-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#controversy-does-shmem-count-as-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Clearly, the calculation above includes shmem in cache. Theoretically, shmem shouldn&amp;rsquo;t be part of cache.&lt;/p&gt;
&lt;p&gt;In fact, the kernel community has discussed this&lt;a href="https://lore.kernel.org/all/YS0Eq&amp;#43;tNe4Pr7O0X@casper.infradead.org/T/" target="_blank" rel="noreferrer"&gt;Why is Shmem included in Cached in /proc/meminfo?&lt;/a&gt;, wanting to remove shared memory from cache:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;	cached &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;global_node_page_state&lt;/span&gt;(NR_FILE_PAGES) &lt;span style="color:#f92672"&gt;-&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;			&lt;span style="color:#a6e22e"&gt;total_swapcache_pages&lt;/span&gt;() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; i.bufferram;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;	cached &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;global_node_page_state&lt;/span&gt;(NR_FILE_PAGES) &lt;span style="color:#f92672"&gt;-&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;			&lt;span style="color:#a6e22e"&gt;total_swapcache_pages&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;			&lt;span style="color:#f92672"&gt;-&lt;/span&gt; i.bufferram &lt;span style="color:#f92672"&gt;-&lt;/span&gt; i.sharedram;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;But modifying this involves forward compatibility concerns. The question comes down to: which is more important — forward compatibility or improving the accuracy of a piece of information?&lt;/p&gt;
&lt;p&gt;Currently, there&amp;rsquo;s no good resolution; that&amp;rsquo;s the status quo.&lt;/p&gt;
&lt;p&gt;The email thread also discusses some interesting things:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Another point of view is that everything in tmpfs is part of the page
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cache and can be written out to swap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- Dirty: total amount of RAM used to buffer data to be written on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;permanent storage (dirty). Gets converted to Cached when write is
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;complete. (Actually I would call this &amp;#34;Buffers&amp;#34; but Dirty is okay, too.)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- Cached: total amount of RAM used to improve *performance* that can be
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;*immediately dropped* without any data-loss – note that this includes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;all untouched RAM backed by swap.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;- Shared: total amount of RAM shared between multiple process that
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cannot be freed even if any single process gets killed. (If this is even
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;possible to know - note that this would *only* contain COW pages in
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;practice. We already have Committed_AS which is about as good for real
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;world heuristics.)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;cache does not include dirty pages, and can be directly dropped without data loss&lt;/li&gt;
&lt;li&gt;tmpfs is swapout&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Shared memory appears to be swapout, which is clearly different from cache pages that can be directly dropped. PostgreSQL&amp;rsquo;s shared memory clearly cannot be directly dropped.&lt;/p&gt;
&lt;p&gt;So for PostgreSQL, the fact that cache contains shared memory is quite important — don&amp;rsquo;t assume by default that it doesn&amp;rsquo;t.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Memory Page Statistics Often Don&amp;rsquo;t Add Up
 &lt;div id="memory-page-statistics-often-dont-add-up" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-page-statistics-often-dont-add-up" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When calculating memory pages, some calculations don&amp;rsquo;t add up. Summary of reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shmem is counted in cache&lt;/li&gt;
&lt;li&gt;Cannot see file-mapped and anonymous-mapped pages within shmem&lt;/li&gt;
&lt;li&gt;nr_anon_pages does not include shared memory pages, while nr_inactive_anon and nr_active_anon include anonymous shared memory pages&lt;/li&gt;
&lt;li&gt;VM and cgroup have slightly different statistical scopes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;cgroup v1
 &lt;div id="cgroup-v1" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-v1" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;cgroup Memory Management
 &lt;div id="cgroup-memory-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-memory-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;cgroup can observe and limit the usage of anonymous pages, file pages, swap cache, and kernel memory. Each memcg has its own independent LRU; there is no concept of a GLOBAL LRU.&lt;/p&gt;
&lt;p&gt;cgroup memory management differs from cgroup CPU management. A task can request lots of CPU work; reaching the cgroup CPU limit can extend execution time to handle it. However, the memory a task occupies is working memory — a task uses the same physical memory.&lt;/p&gt;
&lt;p&gt;Key differences between cgroup CPU and memory management:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Memory must be managed through reuse and reclamation; a task&amp;rsquo;s working memory is truly occupied and cannot be used by other tasks. CPU is managed through time allocation; other tasks or cgroups can use it.&lt;/li&gt;
&lt;li&gt;Memory needs to be instantly available; CPU works through time slicing — time can be dispersed.&lt;/li&gt;
&lt;li&gt;CPU control&amp;rsquo;s core is time allocation; Memory Control&amp;rsquo;s core is page counting.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;The core of the design is a counter called the page_counter. The
page_counter tracks the current memory usage and limit of the group of
processes associated with the controller&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Memory Control&amp;rsquo;s core is page counting, meaning it&amp;rsquo;s not that physical pages are statically assigned. The memory allocated this time, when released back to free after use, most likely won&amp;rsquo;t be the same physical page next time&lt;sup id="fnref:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Physical pages know which cgroup they belong to:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				+--------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				| mem_cgroup |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				| &lt;span style="color:#f92672"&gt;(&lt;/span&gt;page_counter&lt;span style="color:#f92672"&gt;)&lt;/span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				+--------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 / ^ &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				/ | &lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ | +---------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | mm_struct | |.... | mm_struct |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ | +---------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; + --------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ +------+--------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | page +----------&amp;gt; page_cgroup|
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------+ +---------------+&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;mm_struct represents virtual memory. Each virtual memory knows which cgroup it belongs to; each physical page can point to page_cgroup, meaning it knows which cgroup this physical memory belongs to&lt;sup id="fnref1:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;cgroup Parameters and Metrics
 &lt;div id="cgroup-parameters-and-metrics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-parameters-and-metrics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;cgroup uses interface files for configuration and viewing memory usage.&lt;/p&gt;
&lt;p&gt;Directory: &lt;code&gt;cd /sys/fs/cgroup/memory/xxx/&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Kernel memory and mem+swap can have separate settings or usage viewing:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;memory.kmem.xxx #kernel mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;memory.memsw.xxx #mem+swap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Below, we only look at mem-related items.&lt;/p&gt;
&lt;p&gt;Interface files can be divided into three categories:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read-only — show usage, permissions: &lt;code&gt;-r--r--r--&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Read-write — control parameters, permissions: &lt;code&gt;-rw-r--r--&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Other — special settings, permissions: other&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Specific meanings are as follows, with important parameters highlighted:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Type&lt;/th&gt;
 &lt;th&gt;Interface File&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;memory.numa_stat&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;NUMA-dimensional memory stats&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;&lt;code&gt;memory.stat&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Important&lt;/strong&gt;, the primary memory usage interface file with many metrics; analyzed separately below&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;memory.usage_in_bytes&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;usage_in_bytes is affected by the method and doesn&amp;rsquo;t show &amp;rsquo;exact&amp;rsquo; value of memory. Not recommended for viewing cgroup memory usage&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-only&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;&lt;code&gt;memory.failcnt&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Number of times memory usage exceeded &lt;code&gt;memory.limit_in_bytes&lt;/code&gt;, cumulative&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;cgroup.clone_children&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls whether child cgroups inherit parent configuration&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;&lt;code&gt;cgroup.procs&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Used to manage process groups (process IDs, PIDs) in the current cgroup. &lt;strong&gt;For multi-process PostgreSQL, this means writing all PG processes, including management processes and backends, into the &lt;code&gt;procs&lt;/code&gt; file&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;tasks&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Used to manage threads (thread IDs, TIDs) in the current cgroup. When writing a process PID to &lt;code&gt;cgroup.procs&lt;/code&gt;, all its thread TIDs are automatically added to &lt;code&gt;tasks&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;notify_on_release&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;Controls whether a release operation is triggered when the last task (process or thread) in the cgroup exits. Would only be enabled for container management; traditional cgroup management keeps it disabled by default. Cgroups should be preserved after database restart&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;memory.move_charge_at_immigrate&lt;/td&gt;
 &lt;td&gt;Deprecated in v2. Charge attribution rules when migrating cgroups&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;memory.use_hierarchy&lt;/td&gt;
 &lt;td&gt;Whether parent cgroup limits child cgroups&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.limit_in_bytes&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;cgroup memory upper limit&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.soft_limit_in_bytes&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Reclaim the portion exceeding the soft limit&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;memory.max_usage_in_bytes&lt;/td&gt;
 &lt;td&gt;cgroup usage peak, an observation metric&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.oom_control&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;oom_kill_disable 1 — disable OOM&lt;br&gt;under_oom 0 — whether currently in OOM state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read-write&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;memory.swappiness&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;cgroup-level swappiness&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Other&lt;/td&gt;
 &lt;td&gt;memory.force_empty&lt;/td&gt;
 &lt;td&gt;Write only; writing &lt;code&gt;0&lt;/code&gt; forces release of all cgroup memory&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Other&lt;/td&gt;
 &lt;td&gt;cgroup.event_control&lt;/td&gt;
 &lt;td&gt;Event notification interface, listens for memory pressure events, requires programming. Often used with memory.pressure_level&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Other&lt;/td&gt;
 &lt;td&gt;memory.pressure_level&lt;/td&gt;
 &lt;td&gt;Memory pressure notification level&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Using a PG instance to explain the meaning of various metrics in memory.stat.&lt;/p&gt;
&lt;p&gt;This PG instance is configured as:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_memory_type&lt;span style="color:#f92672"&gt;=&lt;/span&gt;mmap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_buffers&lt;span style="color:#f92672"&gt;=&lt;/span&gt;64GB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;approximately &lt;span style="color:#ae81ff"&gt;800&lt;/span&gt; clients, running&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat memory.stat
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cache &lt;span style="color:#ae81ff"&gt;345587761152&lt;/span&gt; 						 &lt;span style="color:#75715e"&gt;#page cache!!!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss &lt;span style="color:#ae81ff"&gt;27332608&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Anonymous and swap cache memory size. Note: differs from OS process RSS; clearly doesn&amp;#39;t include PG shared memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss_huge &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;#of bytes of anonymous transparent hugepages. Note: anonymous huge pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mapped_file &lt;span style="color:#ae81ff"&gt;61491769344&lt;/span&gt; &lt;span style="color:#75715e"&gt;#File shared memory size; includes PG shared memory here&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;swap &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;#On swap partition&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgpgin &lt;span style="color:#ae81ff"&gt;389395357&lt;/span&gt; &lt;span style="color:#75715e"&gt;#rss+cache charged pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgpgout &lt;span style="color:#ae81ff"&gt;305016672&lt;/span&gt; &lt;span style="color:#75715e"&gt;#rss+cache uncharged pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgfault &lt;span style="color:#ae81ff"&gt;1954040341&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Omitted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pgmajfault &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Omitted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inactive_anon &lt;span style="color:#ae81ff"&gt;165728256&lt;/span&gt; &lt;span style="color:#75715e"&gt;#anonymous and swap cache memory on inactive LRU&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active_anon &lt;span style="color:#ae81ff"&gt;61549518848&lt;/span&gt; &lt;span style="color:#75715e"&gt;#anonymous and swap cache memory on active LRU list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inactive_file &lt;span style="color:#ae81ff"&gt;138240962560&lt;/span&gt; &lt;span style="color:#75715e"&gt;#file-backed on inactive LRU list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active_file &lt;span style="color:#ae81ff"&gt;145658613760&lt;/span&gt; &lt;span style="color:#75715e"&gt;#file-backed memory on active LRU list&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;unevictable &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#75715e"&gt;#Unreclaimable memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hierarchical_memory_limit &lt;span style="color:#ae81ff"&gt;408021893120&lt;/span&gt; &lt;span style="color:#75715e"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hierarchical_memsw_limit &lt;span style="color:#ae81ff"&gt;9223372036854771712&lt;/span&gt; &lt;span style="color:#75715e"&gt;#&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_xxx &lt;span style="color:#75715e"&gt;#hierarchical &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Roughly (ignoring swap), cache+rss = inactive_anon+active_anon+inactive_file+active_file.&lt;/p&gt;
&lt;p&gt;These values are quite convoluted. cache+rss doesn&amp;rsquo;t have a straightforward correspondence with [in]active_anon/file, and mapped_file (shared memory) is hard to categorize, making it easy to get confused. Combining various documentation and testing, I hand-rolled the following script:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#cginfo_lzl&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;shared_mem_mapped : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2 / 1024 / 1024 /1024 }&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;shared_mem_anon : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|inactive_anon|active_anon&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2} NR==3 {c=$2; print (b + c -a)/1024/1024/1024}&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;pagecache : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;cache&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2 / 1024 / 1024 /1024 }&amp;#39;&lt;/span&gt; ;&lt;span style="color:#ae81ff"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;pagecache_cache-share : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;cache|mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;NR==1 {a=$2} NR==2 {b=$2; print (a - b)/1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;n
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;file_total : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_file|active_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;anon_total : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_anon|active_anon&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_used_rss+map : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_mem_file+rss+map : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_file|active_file|rss|mapped_file&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_mem_rss+cache : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|cache&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_mem_anon+file : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;inactive_file|active_file|inactive_anon|active_anon&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;total_memsw : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.stat|egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|cache|swap&amp;#34;&lt;/span&gt;| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum += $2} END {print sum /1024/1024/1024}&amp;#39;&lt;/span&gt;;&lt;span style="color:#ae81ff"&gt;\\&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo -n &lt;span style="color:#e6db74"&gt;&amp;#34;hard_limit : &amp;#34;&lt;/span&gt;;cat /sys/fs/cgroup/memory/$PGNAME/memory.limit_in_bytes| awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $1 / 1024 / 1024 /1024 }&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Database with shared_buffers=2GB&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_mapped : 1.69063
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_anon : 1.69828
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache : 5.94717
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache_cache-share : 4.25654
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;file_cache : 4.24889
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_cache : 3.23096
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_used_rss+map : 3.2233
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_file+rss+map : 7.47219
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_rss+cache : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_anon+file : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_memsw : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hard_limit : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Differences Between cgroup RSS and Process RSS
 &lt;div id="differences-between-cgroup-rss-and-process-rss" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#differences-between-cgroup-rss-and-process-rss" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#shared_buffers= 64GB, all PG process RSS sorted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ps -eo pid,ppid,rss,args |grep &lt;span style="color:#e6db74"&gt;`&lt;/span&gt;cat $PGDATA/postmaster.pid|head -1&lt;span style="color:#e6db74"&gt;`&lt;/span&gt;|sort -k3 -rn
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97632&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61103720&lt;/span&gt; postgres: lzlinst: checkpointer 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97633&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;59045152&lt;/span&gt; postgres: lzlinst: background writer 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2322820&lt;/span&gt; /paic/postgres/base/11.3/bin/postgres -D /paic/pg6888/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97637&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;85116&lt;/span&gt; postgres: lzlinst: pgsentinel 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97697&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19620&lt;/span&gt; postgres: lzlinst: dbmgr users &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97634&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17932&lt;/span&gt; postgres: lzlinst: walwriter 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;250063&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14508&lt;/span&gt; postgres: lzlinst: dbmon postgres &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97636&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13220&lt;/span&gt; postgres: lzlinst: stats collector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;248777&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11576&lt;/span&gt; postgres: lzlinst: dbmon postgres &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97635&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2980&lt;/span&gt; postgres: lzlinst: autovacuum launcher 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97638&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2376&lt;/span&gt; postgres: lzlinst: logical replication launcher 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;97630&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1592&lt;/span&gt; postgres: lzlinst: logger 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;250185&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;39130&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;972&lt;/span&gt; grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;97627&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generally, the PG processes with the highest RSS values are checkpointer and bgwriter, because RSS represents actual memory used, including shared memory, and these two processes that flush shared buffer dirty pages occupy the most. Backends with excessive data queries may also have higher RSS values, but this is usually caused by data extracts or slow full-scan queries.&lt;/p&gt;
&lt;p&gt;Why is postmaster&amp;rsquo;s RSS so small? Because postmaster itself doesn&amp;rsquo;t need to do much shared_buffer operations; it only needs to open up the shared memory virtual address space and fork it for other processes to use.&lt;/p&gt;
&lt;p&gt;PM&amp;rsquo;s child processes have the same shared memory address but not necessarily the same RSS:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/97632/smaps |grep -A &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zero&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#checkpointer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b4fd87cf000-2b60a2143000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;15925397&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;70411728&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;61087812&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;31429895&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/97633/smaps |grep -A &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zero&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#bgwriter&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b4fd87cf000-2b60a2143000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;15925397&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;70411728&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;59043388&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;29394787&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/97627/smaps |grep -A &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zero&amp;#34;&lt;/span&gt; &lt;span style="color:#75715e"&gt;#postmaster&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b4fd87cf000-2b60a2143000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;15925397&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;70411728&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;2318408&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;1741764&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Above, checkpointer and bgwriter occupy the most RSS, and most of their RSS is shared memory. These two processes almost evenly split the entire actually-used shared memory, while postmaster doesn&amp;rsquo;t use much. PM and all its forked child processes have the same shared memory virtual address.&lt;/p&gt;
&lt;p&gt;But cgroup RSS is only a few tens of MB, far less than process RSS:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /sys/fs/cgroup/memory/lzlinst/memory.stat |egrep -w &lt;span style="color:#e6db74"&gt;&amp;#34;rss|mapped_file&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss &lt;span style="color:#ae81ff"&gt;88997888&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mapped_file &lt;span style="color:#ae81ff"&gt;52963262464&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can see that PG shared memory is not in the cgroup stat RSS. cgroup RSS doesn&amp;rsquo;t count file pages or shared file pages.&lt;/p&gt;
&lt;p&gt;linux kernel&lt;sup id="fnref2:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Only anonymous and swap cache memory is listed as part of &amp;lsquo;rss&amp;rsquo; stat. This should not be confused with the true &amp;lsquo;resident set size&amp;rsquo; or the amount of physical memory used by the cgroup.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Process vs. cgroup memory statistics differences&lt;sup id="fnref:13"&gt;&lt;a href="#fn:13" class="footnote-ref" role="doc-noteref"&gt;13&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;Memory&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Single Process&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Process &lt;code&gt;cgroup(memcg)&lt;/code&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;cache&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;None&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;PageCache&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;mapped_file&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;None&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;file_rss + shmem_rss&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;RSS&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;anon_rss + file_rss ＋ shmem_rss&lt;/code&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;code&gt;anon_rss&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;For PostgreSQL, the RSS in stat does not include file map shared memory. The PG official documentation describes mmap as anonymous shared memory:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Possible values are &lt;code&gt;mmap&lt;/code&gt; (for anonymous shared memory allocated using &lt;code&gt;mmap&lt;/code&gt;), &lt;code&gt;sysv&lt;/code&gt; (for System V shared memory allocated via &lt;code&gt;shmget&lt;/code&gt;)&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;cgroup counts PG mmap memory as mapped_file.&lt;/p&gt;
&lt;p&gt;Observing sysv and huge page scenarios, summary of PG&amp;rsquo;s memory.stat metrics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RSS in stat does not include file map shared memory. Observation shows that regardless of mmap or sysv, RSS does not contain PG shared memory&lt;/li&gt;
&lt;li&gt;Similarly, rss_huge also does not include file map shared huge page memory. Observation shows that even with huge pages enabled, stat does not contain PG shared memory&lt;/li&gt;
&lt;li&gt;Without huge pages, PG shared memory (mmap or sysv) is all counted under memory.stat mapped_file; with huge pages, it&amp;rsquo;s in none of the stat metrics, including rss_huge&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Where Exactly Is mapped_file?
 &lt;div id="where-exactly-is-mapped_file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#where-exactly-is-mapped_file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2cf787e36cc8.png" alt="RHEL Memory Usage Patterns" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;mapped_file is in cache, and also in inactive_anon+active_anon&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;mapped_file can also be anonymous; both mmap and sysv are counted here&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Database with shared_buffers=2GB&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_mapped : 1.69063
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_anon : 1.69828
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache : 5.94717
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache_cache-share : 4.25654
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;file_cache : 4.24889
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_cache : 3.23096
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_used_rss+map : 3.2233
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_file+rss+map : 7.47219
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_rss+cache : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_anon+file : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_memsw : 7.47984
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hard_limit : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;soft_limit_in_bytes
 &lt;div id="soft_limit_in_bytes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#soft_limit_in_bytes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Soft limit (&lt;code&gt;memory.soft_limit_in_bytes&lt;/code&gt;) is a non-enforced constraint in cgroup memory management. When a cgroup&amp;rsquo;s memory usage exceeds the soft limit, the system does not immediately force memory reclamation. Instead, it will &lt;strong&gt;preferentially reclaim the excess memory&lt;/strong&gt; of that cgroup &lt;strong&gt;when global memory pressure is high&lt;/strong&gt; (e.g., when overall system free memory is insufficient).&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Trigger condition&lt;/strong&gt;: Global memory pressure (e.g., insufficient system free memory).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Call path&lt;/strong&gt;: &lt;code&gt;kswapd&lt;/code&gt; → &lt;code&gt;balance_pgdat&lt;/code&gt; → check cgroup soft limits → trigger reclamation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reclamation target&lt;/strong&gt;: Preferentially reclaim memory pages from cgroups exceeding their soft limits.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;+-------------------+ Global memory pressure detection +-------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| kswapd thread | ------------------------------------&amp;gt; | balance_pgdat |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;+-------------------+ +-------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Traverse memory zones and check
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; v
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Check each cgroup&amp;#39;s soft |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | limit usage |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Trigger reclamation for over-limit cgroups
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; v
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | Page reclamation (LRU list |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | scanning, etc.) |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; +---------------------------+&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The soft_limit_in_bytes mechanism is very similar to high. In v2, soft_limit_in_bytes has been deprecated, replaced by three new parameters: min, low, and high.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Impact of Overselling on pagecache
 &lt;div id="impact-of-overselling-on-pagecache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#impact-of-overselling-on-pagecache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To be discussed later&lt;/p&gt;

&lt;h3 class="relative group"&gt;cg oom
 &lt;div id="cg-oom" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cg-oom" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, if sharedbuffer = 1/4 of cg mem, then without counting private memory, pagecache can reach up to 3/4 of cg mem. Generally, normal business private memory usage won&amp;rsquo;t be very high. If cg mem is full, memory can be reclaimed from cg pagecache (this is direct memory reclamation; AliOS has implemented async background reclamation: &lt;a href="https://help.aliyun.com/zh/alinux/user-guide/memcg-backend-asynchronous-reclaim?spm=a2c4g.11186623.0.0.562f42bammLZmK" target="_blank" rel="noreferrer"&gt;Memcg Background Async Reclamation&lt;/a&gt;). So the best way to test cg oom is to use sessions that consume lots of private memory rather than stress testing.&lt;/p&gt;
&lt;p&gt;Test case:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Observe score&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r--r--r-- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; May &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; 16:39 /proc/63766/oom_score
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rss &lt;span style="color:#75715e"&gt;# whichever command you like&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## A SQL that can consume lots of private memory, many union alls create many plan nodes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql -d lzldb -tX -c &lt;span style="color:#e6db74"&gt;&amp;#34;create table lzl1(col1 varchar(1));&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql -tX -c &lt;span style="color:#e6db74"&gt;&amp;#34;\o sqltext.sql&amp;#34;&lt;/span&gt; -c &lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;SELECT &amp;#39;select col1 from lzl1&amp;#39; || &amp;#39; union all&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;FROM generate_series(1, 100000)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;UNION ALL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;SELECT &amp;#39;select col1 from lzl1;&amp;#39;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;FROM generate_series(1, 1);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Adjust stack parameter otherwise SQL will be aborted&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql -d lzldb -c &lt;span style="color:#e6db74"&gt;&amp;#34;set max_stack_depth=1024000&amp;#34;&lt;/span&gt; -f sqltext.sql&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;cg oom off:&lt;/p&gt;
&lt;p&gt;wchan shows OOM information, even an oom score, but the process won&amp;rsquo;t be killed by the OOM killer&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## vm oom enabled; 0: don&amp;#39;t trigger panic, start OOM Killer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/sys/vm/panic_on_oom 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## cg oom disabled; 1: disable oom&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /sys/fs/cgroup/memory/$PGNAME/memory.oom_control
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oom_kill_disable &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;under_oom &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ps -eo user,ppid,pid,state,%cpu,%mem,stime,wchan:14,args,rss,vsz,sig_block |grep &lt;span style="color:#e6db74"&gt;`&lt;/span&gt;head -1 $PGDATA/postmaster.pid&lt;span style="color:#e6db74"&gt;`&lt;/span&gt; |grep -v grep 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;870&lt;/span&gt; D 0.0 0.0 10:54 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;7216&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2807460&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3417&lt;/span&gt; S 0.0 0.0 10:55 pipe_wait postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;22944&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808540&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13069&lt;/span&gt; D 0.0 0.0 11:10 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;11944&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13104&lt;/span&gt; D 0.0 0.0 11:10 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;12224&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;19005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14352&lt;/span&gt; D 0.0 0.0 11:10 mem_cgroup_oom postgres: pg3ymhp2: lzluser &lt;span style="color:#ae81ff"&gt;11680&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2808348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000000000000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /sys/fs/cgroup/memory/$PGNAME/memory.oom_control
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oom_kill_disable &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;under_oom &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/97994/oom_score
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_mapped : 2.00019
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;shared_mem_anon : 2.0023
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache : 2.0023
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pagecache_cache-share : 0.00211334
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;file_cache : &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;anon_cache : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_used_rss+map : 7.99789
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_file+rss+map : 7.99789
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_rss+cache : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mem_anon+file : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_memsw : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hard_limit : &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Currently, it appears that PG processes may also crash when unable to allocate memory. For example, if walwriter crashes, it can cause all other processes to crash.&lt;/p&gt;
&lt;p&gt;cg oom on:&lt;/p&gt;
&lt;p&gt;User processes are killed due to high OOM score, sent kill -9. Most PG processes crash; postmaster &lt;code&gt;reset_shared()&lt;/code&gt; then automatically restarts other processes. Both message and dmesg show out-of-memory information:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#cg oom enabled&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oom_kill_disable &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg log:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2025-05-29 19:10:45.945 CST,,,198877,,6838374d.308dd,4,,2025-05-29 18:30:37 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;server process (PID 236413) was terminated by signal 9: Killed&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;Failed process was running: select col1 from lzl1 union all
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;message:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;May 29 19:10:45 lzlhost kernel: Memory cgroup stats for /t1lzldb: cache:8392988KB rss:8384228KB rss_huge:0KB mapped_file:7458316KB swap:0KB inactive_anon:1310184KB active_anon:15467032KB inactive_file:0KB active_file:0KB unevictable:0KB
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;May 29 19:10:45 lzlhost kernel: Memory cgroup out of memory: Kill process 236413 (postgres) score 497 or sacrifice child
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;dmesg:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;[Thu May 29 18:26:27 2025] Memory cgroup stats for /t1lzldb: cache:8392988KB rss:8384228KB rss_huge:0KB mapped_file:7458316KB swap:0KB inactive_anon:1310184KB active_anon:15467032KB inactive_file:0KB active_file:0KB unevictable:0KB
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;[Thu May 29 18:26:27 2025] Memory cgroup out of memory: Kill process 236413 (postgres) score 497 or sacrifice child
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;[Thu May 29 18:26:27 2025] Killed process 236413 (postgres) total-vm:18828736kB, anon-rss:8328252kB, file-rss:2328kB, shmem-rss:1832kB&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Management differences between cg oom on and off for PG databases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;on: cg oom killer will kill processes with high OOM score, typically user processes&lt;/li&gt;
&lt;li&gt;off: cg oom killer won&amp;rsquo;t start. PG processes will hang — they may recover on their own, but PG&amp;rsquo;s critical processes (like walwriter) might crash due to insufficient memory, and the instance may still go down.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: this is cg oom, not vm oom. System-level vm oom is determined by the system-level vm overcommit mechanism.&lt;/p&gt;

&lt;h3 class="relative group"&gt;cg v1 Problems
 &lt;div id="cg-v1-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cg-v1-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;No cg pagetable statistics&lt;/li&gt;
&lt;li&gt;No cg slab statistics&lt;/li&gt;
&lt;li&gt;No cg hugepage statistics (hugepages are not charged, not just not counted)&lt;/li&gt;
&lt;li&gt;No cg async/sync page reclamation statistics&lt;/li&gt;
&lt;li&gt;cg RSS and process RSS have different statistical scopes&lt;/li&gt;
&lt;li&gt;shmem statistics are messy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;What&amp;rsquo;s New in V2
 &lt;div id="whats-new-in-v2" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#whats-new-in-v2" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;V2 Officially released in Linux 4.5 (March 2016)&lt;sup id="fnref:14"&gt;&lt;a href="#fn:14" class="footnote-ref" role="doc-noteref"&gt;14&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;cgroup v2 memory management improvements and changes:&lt;sup id="fnref:15"&gt;&lt;a href="#fn:15" class="footnote-ref" role="doc-noteref"&gt;15&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;cg mem interface file&lt;/th&gt;
 &lt;th&gt;vs v1&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.current&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Current memory usage. Removes the less useful usage_in_bytes&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.min&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;Different from VM&amp;rsquo;s min/low/high&lt;/strong&gt;. VM watermarks are about remaining OS memory; cg v2 watermarks are about cg memory used. memory.min is a hard memory protection value, default 0. Even when the system has no reclaimable memory, memory at or below this boundary won&amp;rsquo;t be reclaimed&lt;sup id="fnref:16"&gt;&lt;a href="#fn:16" class="footnote-ref" role="doc-noteref"&gt;16&lt;/a&gt;&lt;/sup&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.low&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Best-effort memory protection value, default 0. System preferentially reclaims memory from unprotected cgroups. If still insufficient, reclaims memory between memory.min and memory.low.&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.high&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Memory reclamation warning threshold, default max. When cgroup memory usage reaches high, triggers synchronous memory reclamation for this cgroup and children, trying to keep memory below high&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.max&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Equivalent to memory.limit_in_bytes&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.reclaim&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Active reclamation interface file. v1 only had memory.force_empty for forced clearing&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.peak&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Equivalent to max_usage_in_bytes; exceeding peak triggers cg oom killer&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.oom.group&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Controls whether cg OOM killer terminates the entire cgroup (1) or just a single process (0). Default 0. If oom_score_adj=-1000, process won&amp;rsquo;t be killed&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.events&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Reports memory-related events&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.stat&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Reworked&lt;/td&gt;
 &lt;td&gt;Many changes, analyzed separately&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;memory.zswap.current, memory.zswap.max, memory.zswap.writeback&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;td&gt;Zswap is a compressed swap mechanism in the Linux kernel. Through compressing memory pages awaiting swap, it reduces disk I/O operations, improving system performance. Its core idea is to compress swap data that would have been written to disk and temporarily store it in memory, only writing data to physical swap devices (like swap partitions or files) when necessary&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;soft_limit_in_bytes&lt;/td&gt;
 &lt;td&gt;Removed&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;memory.oom_control&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Removed&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;This means v2 cannot directly disable cg oom killer&lt;/strong&gt;; however, fine-grained memory management can be achieved through min/low/high settings and event memory notifications&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;v2 cg mem management advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Compared to v1, v2 has simpler and clearer hierarchical management&lt;/li&gt;
&lt;li&gt;v1 only had OOM kill or freeze; v2 has more means to control memory size (such as memory.min/low/high)&lt;/li&gt;
&lt;li&gt;v2 makes it easier to control burst loads&lt;sup id="fnref:17"&gt;&lt;a href="#fn:17" class="footnote-ref" role="doc-noteref"&gt;17&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;li&gt;Removes the interface file for directly disabling cg oom killer&lt;/li&gt;
&lt;li&gt;Adds memory_hugetlb_accounting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;memory.stat:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;strong&gt;Parameter&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;Meaning&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;v1 Counterpart&lt;/strong&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;anon&lt;/td&gt;
 &lt;td&gt;Anonymous pages&lt;/td&gt;
 &lt;td&gt;active_anon+inactive_anon&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file&lt;/td&gt;
 &lt;td&gt;File pages, including tmpfs&lt;/td&gt;
 &lt;td&gt;active_file+inactive_file&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;kernel (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Total kernel memory, including kernel_stack, &lt;strong&gt;pagetables&lt;/strong&gt;, percpu, vmalloc, &lt;strong&gt;slab&lt;/strong&gt;, and other kernel memory usage.&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;kernel_stack&lt;/td&gt;
 &lt;td&gt;Memory occupied by kernel stacks.&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pagetables&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;page tables&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sec_pagetables&lt;/td&gt;
 &lt;td&gt;Secondary page tables, suitable for VMs, GPU devices, network acceleration cards, and other hardware resource isolation scenarios&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;percpu (npn)&lt;/td&gt;
 &lt;td&gt;Memory size used for per-cpu kernel data structures&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sock (npn)&lt;/td&gt;
 &lt;td&gt;network transmission buffers&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;vmalloc (npn)&lt;/td&gt;
 &lt;td&gt;vmalloc&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;shmem&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Including tmpfs, shm, shared anonymous mmap&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zswap&lt;/td&gt;
 &lt;td&gt;Memory consumed by zswap compression itself&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zswapped&lt;/td&gt;
 &lt;td&gt;Amount of user memory zswapped&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;file_mapped&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;mmap() size&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Somewhat similar to v1 mapped_file, though mapped_file includes tmpfs, shm&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file_dirty&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1 dirty&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file_writeback&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1 writeback&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;swapcached&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1 swapcached&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;anon_thp&lt;/td&gt;
 &lt;td&gt;Anonymous pages in transparent huge pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;file_thp&lt;/td&gt;
 &lt;td&gt;File pages in transparent huge pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;shmem_thp&lt;/td&gt;
 &lt;td&gt;Transparent huge pages for shm, tmpfs, anonymous mmap&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;inactive_anon, active_anon, inactive_file, active_file, unevictable&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;Same as v1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;slab_reclaimable&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;slab_unreclaimable&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;slab (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;workingset_refault_anon, workingset_refault_file, workingset_activate_anon, workingset_activate_file, workingset_restore_anon, workingset_restore_file, workingset_nodereclaim&lt;/td&gt;
 &lt;td&gt;Refaulted page statistics&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pswpin (npn)&lt;/td&gt;
 &lt;td&gt;swap in&lt;/td&gt;
 &lt;td&gt;Same as v1 pgpgin&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pswpout (npn)&lt;/td&gt;
 &lt;td&gt;swap out&lt;/td&gt;
 &lt;td&gt;Same as v1 pgpgout&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgscan (npn)&lt;/td&gt;
 &lt;td&gt;scanned pages (in an inactive LRU list)&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgsteal (npn)&lt;/td&gt;
 &lt;td&gt;Reclaimed memory&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgscan_kswapd (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgscan_direct (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgscan_khugepaged (npn)&lt;/td&gt;
 &lt;td&gt;Pages scanned by the transparent huge page daemon&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgscan_proactive (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Pages scanned proactively&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgsteal_kswapd (npn), pgsteal_direct (npn), pgsteal_khugepaged (npn), pgsteal_proactive (npn)&lt;/td&gt;
 &lt;td&gt;As the name suggests; pgsteal\* corresponds to pgscan\*&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgfault (npn)&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;Same as v1 pgfault&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgmajfault (npn)&lt;/td&gt;
 &lt;td&gt;As the name suggests&lt;/td&gt;
 &lt;td&gt;Same as v1 pgmajfault&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgrefill (npn)&lt;/td&gt;
 &lt;td&gt;Pages scanned in active LRU&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;pgactivate (npn)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Pages moved to active LRU&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgdeactivate (npn)&lt;/td&gt;
 &lt;td&gt;Pages moved to inactive LRU&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pglazyfree (npn)&lt;/td&gt;
 &lt;td&gt;Pages whose release is deferred when under memory pressure&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pglazyfreed (npn)&lt;/td&gt;
 &lt;td&gt;Reclaimed lazyfree pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;swpin_zero,swpout_zero&lt;/td&gt;
 &lt;td&gt;zero-filled pages; during Swap In, when the kernel detects page content is all zeros (Zero-filled), marks the page as &amp;ldquo;zero page&amp;rdquo; in metadata, skipping disk I/O&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;zswpin,zswpout,zswpwb&lt;/td&gt;
 &lt;td&gt;zswap-related pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;thp_fault_alloc (npn), thp_collapse_alloc (npn), thp_swpout (npn), thp_swpout_fallback (npn)&lt;/td&gt;
 &lt;td&gt;Transparent huge page-related pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;numa_pages_migrated (npn), numa_pte_updates (npn), numa_hint_faults (npn)&lt;/td&gt;
 &lt;td&gt;NUMA-related pages; also memory.numa_stat exists&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pgdemote_kswapd, pgdemote_direct, pgdemote_khugepaged, pgdemote_proactive&lt;/td&gt;
 &lt;td&gt;Unclear what demote means&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;hugetlb&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Huge pages&lt;/td&gt;
 &lt;td&gt;New&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;v2 cg mem observation advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Adds slab, pagetable, pgscank/pgscand/pgsteal, and huge page info — none of which v1 had&lt;/li&gt;
&lt;li&gt;More observation metrics related to specific features, such as sock, vmalloc, transparent huge pages, zswap compression interactions, swap_zero zero-fill interactions, etc.&lt;/li&gt;
&lt;li&gt;Shared memory shmem and file_mapped metrics are separated&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;wchan
 &lt;div id="wchan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wchan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Waiting Channel, name of the kernel function in which the process is sleeping&lt;/p&gt;
&lt;p&gt;Generally, you should check the wchan of processes in D state to see what kernel function the process is waiting on.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;-&lt;/code&gt;: Running tasks will display a dash (&amp;rsquo;-&amp;rsquo;) in this column&lt;/p&gt;
&lt;p&gt;&lt;code&gt;poll_schedule_timeout&lt;/code&gt;: Common for PM, usually in running state&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zz ***Fri May &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; 04:50:10 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;141378&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.5 0.4 &lt;span style="color:#ae81ff"&gt;70585180&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2322876&lt;/span&gt; poll_schedule_timeout S 21:06:18 00:02:40 /paic/postgres/base/11.3/bin/postgres -D /paic/pg6888/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Fri May &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; 04:50:43 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;141378&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.5 0.4 &lt;span style="color:#ae81ff"&gt;70585180&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2322876&lt;/span&gt; - R 21:06:18 00:02:42 /paic/postgres/base/11.3/bin/postgres -D /paic/pg6888/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;futex_wait_queue_me&lt;/code&gt;: Common for SLEEP processes. Occasionally D state&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;455358&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;141378&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 4.7 1.0 &lt;span style="color:#ae81ff"&gt;70590684&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5349576&lt;/span&gt; futex_wait_queue_me S 03:01:12 00:02:47 postgres: t1lzldb: lzl test3 30.181.32.3&lt;span style="color:#f92672"&gt;(&lt;/span&gt;39801&lt;span style="color:#f92672"&gt;)&lt;/span&gt; COMMIT&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;hugetlb_fault&lt;/code&gt;: Only seen when huge pages are first loaded and load starts up&lt;/p&gt;
&lt;p&gt;&lt;code&gt;do_last&lt;/code&gt;: Function in the VFS (Virtual File System) path resolution logic, responsible for handling the last component of a file path (such as filename or symbolic link) and triggering actual file operations&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lock_page_killable&lt;/code&gt;: Lock a physical memory page in an interruptible manner. &amp;ldquo;Interruptible&amp;rdquo; means the process is allowed to respond to fatal signals like &lt;code&gt;SIGKILL&lt;/code&gt; while waiting for the page lock&lt;/p&gt;
&lt;p&gt;&lt;code&gt;rpc_wait_bit_killable&lt;/code&gt;: This function relates to the Remote Procedure Call (RPC) mechanism, used in the kernel to wait for changes to certain bit flags&lt;/p&gt;
&lt;p&gt;&lt;code&gt;wait_on_page_bit&lt;/code&gt;: Wait for changes to page flag states (e.g., PG_locked, PG_writeback)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;blkdev_issue_flush&lt;/code&gt;: Block device layer cache flush function. Possible call chain: user calls &lt;code&gt;fsync()&lt;/code&gt; → file system (e.g., ext4) submits relevant dirty pages to the block device layer → calls &lt;code&gt;blkdev_issue_flush()&lt;/code&gt; to ensure device cache is flushed&lt;/p&gt;
&lt;p&gt;&lt;code&gt;on_proc_exit&lt;/code&gt;: Register cleanup functions for process exit&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ima_file_check&lt;/code&gt;: Belongs to the IMA (Integrity Measurement Architecture) subsystem, used to verify file integrity during file access; typically involved with &lt;code&gt;open()&lt;/code&gt; calls&lt;/p&gt;
&lt;p&gt;&lt;code&gt;flush_work&lt;/code&gt;: Wait for task completion&lt;/p&gt;
&lt;p&gt;&lt;code&gt;call_rwsem_down_write_failed&lt;/code&gt;: When attempting to acquire a write lock (&lt;code&gt;down_write()&lt;/code&gt;) fails, this function handles write lock contention and waiting logic. It uses spin or sleep mechanisms to make the current process wait for lock release (rwsem: read-write semaphore)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;get_request&lt;/code&gt;: &lt;strong&gt;Appears when iowait is high&lt;/strong&gt;. Gets a free request structure (&lt;code&gt;struct request&lt;/code&gt;) from the block device request queue. If the queue is full (device processing speed insufficient), the thread waits until a request is available&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lookup_slow&lt;/code&gt;: Slow path for VFS (Virtual File System) path resolution&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/**
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * lookup_fast - do fast lockless (but racy) lookup of a dentry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * @nd: current nameidata
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Do a fast, but racy lookup in the dcache for the given dentry, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * revalidate it. Returns a valid dentry pointer or NULL if one wasn&amp;#39;t
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * found. On error, an ERR_PTR will be returned.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;lookup_fast&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; nameidata &lt;span style="color:#f92672"&gt;*&lt;/span&gt;nd)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Fast lookup failed, do it the slow way */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;__lookup_slow&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; qstr &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; flags)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;lookup_slow&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; qstr &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; flags)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; inode &lt;span style="color:#f92672"&gt;*&lt;/span&gt;inode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; dir&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_inode;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dentry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;res;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;inode_lock_shared&lt;/span&gt;(inode);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	res &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;__lookup_slow&lt;/span&gt;(name, dir, flags);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;inode_unlock_shared&lt;/span&gt;(inode);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; res;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lookup_fast and lookup_slow both search for dentries and return them. lookup_fast searches in the dentry cache; if it fails, lookup_slow is used.&lt;/p&gt;
&lt;p&gt;Stress testing with huge pages enabled, no direct memory reclamation, the following events occurred:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;lock_page&lt;/code&gt;: &lt;strong&gt;Appears when iowait is high&lt;/strong&gt;. When the kernel attempts to lock a memory page, if the page is already locked by another thread/process, the current thread enters a waiting state.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vx_svar_sleep_unlock&lt;/code&gt;, &lt;code&gt;vx_ilock&lt;/code&gt;, &lt;code&gt;vx_bc_biowait&lt;/code&gt;, &lt;code&gt;vx_dio_physio&lt;/code&gt;, &lt;code&gt;vx_rwsleep_lock&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;vx is a journaling &lt;strong&gt;file system&lt;/strong&gt; developed by Veritas (now owned by Symantec and subsequently spun off as Veritas Technologies), designed for high-performance, high-availability large-scale data storage, &lt;strong&gt;primarily targeting enterprise application scenarios&lt;/strong&gt;. Like xfs and ext4, it is a type of file system.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pipe_wait&lt;/code&gt;: When a process attempts to read from or write to a pipe, if the pipe buffer is full (write operation) or empty (read operation), the current thread enters sleep state, waiting for buffer state changes&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pipe_write&lt;/code&gt;: Entry function for pipe write operations. When the buffer is full, the thread sleeps in this function, waiting for writable space&lt;/p&gt;
&lt;p&gt;&lt;code&gt;congestion_wait&lt;/code&gt;: When the block device I/O queue is congested (e.g., request queue full or device processing delayed), the kernel uses this function to briefly sleep the thread&lt;/p&gt;
&lt;p&gt;&lt;code&gt;wait_iff_congested&lt;/code&gt;: Checks whether the block device queue is congested and enters brief sleep if so. Similar to &lt;code&gt;congestion_wait&lt;/code&gt; but more lightweight, typically used in memory reclamation or dirty page writeback paths&lt;/p&gt;
&lt;p&gt;&lt;code&gt;mem_cgroup_oom_synchronize&lt;/code&gt;: When &lt;code&gt;usage_in_bytes&lt;/code&gt; reaches &lt;code&gt;limit_in_bytes&lt;/code&gt;, marks oom_control.under_oom=1. Whether the OOM killer kernel module is activated depends on oom_control.oom_kill_disable&lt;/p&gt;
&lt;p&gt;&lt;code&gt;mem_cgroup_oom&lt;/code&gt;: Same as &lt;code&gt;mem_cgroup_oom_synchronize&lt;/code&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;rmap_walk
 &lt;div id="rmap_walk" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rmap_walk" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;One of PFRA&amp;rsquo;s goals is to reclaim shared page frames. To achieve this, the Linux 2.6 kernel can quickly locate all page table entries pointing to the same page frame — this process is called reverse mapping[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)].&lt;/p&gt;
&lt;p&gt;When a page frame already referenced by one process is inserted into another process&amp;rsquo;s page table entries (fork), rmap_walk should also occur&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zcat hostlzl_ps_25.04.08.0900.dat.gz|egrep &lt;span style="color:#e6db74"&gt;&amp;#34;\-D /dirlzl/pg5998/data|zzz&amp;#34;&lt;/span&gt;|less
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:10:50 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.2 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117844&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:01:56 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:11:20 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.2 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117844&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:01:56 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:13:08 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.2 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117844&lt;/span&gt; - D 22:17:21 00:01:57 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;225076&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 1.6 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1720&lt;/span&gt; rmap_walk D 09:11:51 00:00:01 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;224924&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.7 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1728&lt;/span&gt; rmap_walk D 09:11:46 00:00:00 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;224817&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.5 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1720&lt;/span&gt; try_to_unmap_file D 09:11:44 00:00:00 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:19:16 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.3 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117884&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:02:00 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;250875&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.0 0.0 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2208&lt;/span&gt; - R 09:19:17 00:00:00 /dirlzl/postgres/base/postgressqlbin/postgresdb -D /dirlzl/pg5998/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;zzz ***Tue Apr &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; 09:19:48 CST &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;209987&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 0.3 0.5 &lt;span style="color:#ae81ff"&gt;70247548&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2117884&lt;/span&gt; poll_schedule_timeout S 22:17:21 00:02:01 /dirlzl/postgres/base/postgressql/bin/postgresdb -D /dirlzl/pg5998/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;try_to_unmap_file
 &lt;div id="try_to_unmap_file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#try_to_unmap_file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The try_to_unmap_file() function calls try_to_unmap_cluster(), and try_to_unmap_cluster() scans all page table entries corresponding to linear addresses in that linear region, attempting to clear them[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]. try_to_unmap_file() performs reverse mapping of mapped pages. Note: reverse mapping means finding all VMAs through the page table and reclaiming shared physical page frames.&lt;/p&gt;

&lt;h3 class="relative group"&gt;page_referenced
 &lt;div id="page_referenced" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page_referenced" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;referenced and active are used to control page activity level and are used in page reclamation. When refcount=0, it indicates free pages or pages about to be released[^《奔跑吧 Linux内核 入门篇（第2版）》 (Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition)].&lt;/p&gt;
&lt;p&gt;In kernel.org doc&amp;rsquo;s Object-Based Reverse Mapping, there is a description of the page_referenced() function&lt;sup id="fnref1:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;code&gt;page_referenced()&lt;/code&gt; which checks all PTEs that map a page to see if the page has been referenced recently&lt;/p&gt;
&lt;p&gt;&lt;code&gt;page_referenced()&lt;/code&gt; calls &lt;code&gt;page_referenced_obj()&lt;/code&gt; which is the top level function for finding all PTEs within VMAs that map the page.&lt;/p&gt;
&lt;p&gt;If a page is mapped and it is referenced through the mapping, index hash table, this bit is set. It is used during page replacement for moving the page around the LRU lists&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In short, page_referenced() finds all PTEs&amp;rsquo; VMAs that map a page through the page frame. This is also a reverse mapping process.&lt;/p&gt;
&lt;p&gt;Linux introduced two page flags, &lt;code&gt;PG_active&lt;/code&gt; and &lt;code&gt;PG_referenced&lt;/code&gt;, to identify the activity level of pages, thereby deciding how to move pages between two lists (active LRU and inactive LRU).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4fd31681c3a0.png" alt="pic" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PG_active&lt;/code&gt; is used to indicate whether the page is currently active — if this bit is set, the page is active. &lt;code&gt;PG_referenced&lt;/code&gt; is used to indicate whether the page has been accessed recently — each time the page is accessed, this bit is set.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;page_referenced()&lt;/code&gt;: &lt;strong&gt;When the operating system performs page reclamation&lt;/strong&gt;, each time a page is scanned, this function is called to set the page&amp;rsquo;s &lt;code&gt;PG_referenced&lt;/code&gt; bit. If a page&amp;rsquo;s &lt;code&gt;PG_referenced&lt;/code&gt; bit is set but the page is not accessed again within a certain time, its &lt;code&gt;PG_referenced&lt;/code&gt; bit will be cleared.&lt;sup id="fnref:18"&gt;&lt;a href="#fn:18" class="footnote-ref" role="doc-noteref"&gt;18&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Observation Metrics
 &lt;div id="memory-observation-metrics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-observation-metrics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;View basic memory settings:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7c22bffd37cf.png" alt="image.png" /&gt;
Observe memory metrics:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4fed9a1d93d7.png" alt="image.png" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Some Questions
 &lt;div id="some-questions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#some-questions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Do kswapd and Direct Memory Reclamation Execute Together?
 &lt;div id="do-kswapd-and-direct-memory-reclamation-execute-together" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#do-kswapd-and-direct-memory-reclamation-execute-together" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Yes. If it&amp;rsquo;s watermark-triggered memory reclamation, pgscand is often accompanied by pgscank; the reverse is not necessarily true. If both pgscank and pgscand are frequent, consider adjusting memory reclamation watermarks, increasing the delta to prevent it from being quickly breached.&lt;/p&gt;
&lt;p&gt;However, there&amp;rsquo;s another case: when fragmentation rate is high and free memory is still plentiful, blocking memory compaction may be directly triggered with pgscand but no pgscank at all. In this case, adjusting watermarks won&amp;rsquo;t help. Consider enabling huge page memory and increasing shared buffer hit rate to reduce frequent pagecache allocation that fragments memory.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Impact of Oversized pagetable on Memory Reclamation
 &lt;div id="impact-of-oversized-pagetable-on-memory-reclamation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#impact-of-oversized-pagetable-on-memory-reclamation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;An oversized pagetable increases the cost and time of reverse mapping. During direct memory reclamation, reverse mapping is needed to find all processes&amp;rsquo; virtual address spaces (VMAs), then cancel the VMA page table mappings of all processes. This means: the more processes, the larger the pagetable, and the slower the memory reclamation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The more PostgreSQL processes, the larger the pagetable; the larger shared buffer, the larger the pagetable.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Enabling huge page memory can reduce pagetable size by 500x (4k=&amp;gt;2M), not only freeing up memory but also improving memory reclamation efficiency.&lt;/p&gt;

&lt;h3 class="relative group"&gt;How Large Should shared buffers Be?
 &lt;div id="how-large-should-shared-buffers-be" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-large-should-shared-buffers-be" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;sharedbuffers = 1/4 cgmem seems to have become an industry standard, but the actual situation is far more complex. Theoretically, reducing sharedbuffers a bit can increase pagecache a bit, actually slightly increasing total cache size. Increasing sharedbuffers a bit slightly reduces total cache size but improves sharedbuffer hit rate somewhat. Clearly, making sharedbuffers too large is bad, and making it too small is also bad. If sharedbuffers is too small, PG&amp;rsquo;s own working memory becomes too small, effectively offloading memory management to the OS — OS pagecache reclamation will also affect performance. If sharedbuffers is too large, not only is pagecache squeezed, but PG&amp;rsquo;s dirty page flushing impact must also be considered, especially for write-heavy scenarios where corresponding bgwriter parameters need adjustment.&lt;/p&gt;
&lt;p&gt;From rough stress testing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Without huge pages, shared buffers = min(1/4 MEM, 20GB)&lt;/li&gt;
&lt;li&gt;With huge pages, shared buffers = min(1/4 MEM, 60GB)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Is the Difference Between Processes and Threads Really Not That Big?
 &lt;div id="is-the-difference-between-processes-and-threads-really-not-that-big" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#is-the-difference-between-processes-and-threads-really-not-that-big" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Any Linux kernel material will say that the difference between processes and threads is not significant. Whether creating a process or a thread, the kernel uses the same function, kernel_clone, to implement it. The only difference lies in the parameters passed. The fork and clone system calls are roughly the same[^ 《深入理解Linux进程和内存》 (Understanding Linux Processes and Memory)]:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fdb4c651f034.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;strong&gt;Dimension&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;Process&lt;/strong&gt;&lt;/th&gt;
 &lt;th&gt;&lt;strong&gt;Thread&lt;/strong&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;childID&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Each process has an independent &lt;code&gt;pid&lt;/code&gt; (process ID)&lt;/td&gt;
 &lt;td&gt;Each thread has a &lt;code&gt;tid&lt;/code&gt; (thread ID), but the thread&amp;rsquo;s &lt;code&gt;pid&lt;/code&gt; is the same as its process&amp;rsquo;s &lt;code&gt;pid&lt;/code&gt;.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;Address Space&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Each process has an independent address space (&lt;code&gt;mm_struct&lt;/code&gt;), including memory, stack, etc.&lt;/td&gt;
 &lt;td&gt;Threads share the address space of their process; all threads&amp;rsquo; &lt;code&gt;mm_struct&lt;/code&gt; points to the same address space.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;File System&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;Each process has its own &lt;code&gt;fs_struct&lt;/code&gt;, including file descriptors, mount points, etc.&lt;/td&gt;
 &lt;td&gt;Threads share their process&amp;rsquo;s &lt;code&gt;fs_struct&lt;/code&gt;; all threads&amp;rsquo; file descriptors and mount points are the same as the process.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Compared to processes, threads are only slightly &amp;ldquo;lighter&amp;rdquo;. Overall, the similarities between processes and threads outweigh their differences.&lt;/p&gt;
&lt;p&gt;However, when the number of processes increases, the difference becomes significant, especially for multi-process applications like PostgreSQL:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each process has its own VMA, so more address spaces need to be maintained&lt;/li&gt;
&lt;li&gt;Each process has its own pagetable, so pagetables consume more memory&lt;/li&gt;
&lt;li&gt;Multiple processes increase TLB flush overhead, while threads do not&lt;/li&gt;
&lt;li&gt;Process switching requires more context switch overhead, while threads do not&lt;/li&gt;
&lt;li&gt;Inter-process communication (IPC) is less efficient, while threads can directly share memory without IPC communication issues&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You could say: &lt;strong&gt;processes and threads don&amp;rsquo;t differ much at creation time, but multi-process management and multi-thread management differ greatly&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Does the Standby Have PG-Level Dirty Pages?
 &lt;div id="why-does-the-standby-have-pg-level-dirty-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-does-the-standby-have-pg-level-dirty-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The standby&amp;rsquo;s WAL replay mechanism itself generates dirty pages, and the standby also flushes dirty pages. You can view standby dirty pages through pg_buffercache. The standby&amp;rsquo;s dirty pages are different from the primary&amp;rsquo;s — standby dirty data is also just regular relations. You can also observe that the standby&amp;rsquo;s checkpoint/bgwriter/backend dirty flushing is different from the primary&amp;rsquo;s.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Is File Cache Higher on Some Databases and Lower on Others?
 &lt;div id="why-is-file-cache-higher-on-some-databases-and-lower-on-others" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-is-file-cache-higher-on-some-databases-and-lower-on-others" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Generally, databases with high data dispersion have more file cache. Simple slow SQL queries are unlikely to maintain high file cache levels long-term. A slow SQL query accessing lots of data might briefly raise filecache, but after a while, these file pages&amp;rsquo; reference count drops, becoming inactive file pages, and memory can reclaim this portion. However, frequent data dispersion — such as when an index&amp;rsquo;s correlation approaches 0 (like a UUID primary key) — results in decent SQL performance but high reads, potentially generating frequent physical IO and loading too many pages into filecache. Even changes in business patterns can cause a large amount of shared buffer swapping in and out, significantly impacting performance.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PG Processes and Shared Memory Mapping
 &lt;div id="pg-processes-and-shared-memory-mapping" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg-processes-and-shared-memory-mapping" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Without huge pages: /dev/zero (deleted)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/102208/smaps |egrep &lt;span style="color:#e6db74"&gt;&amp;#34;rw\-s&amp;#34;&lt;/span&gt; -A &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aefd8901000-2aefd8902000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;1202061313&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aefd8918000-2aefd898f000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:13 &lt;span style="color:#ae81ff"&gt;4084862058&lt;/span&gt; /dev/shm/PostgreSQL.1008001451
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;476&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aefe2605000-2b00ad129000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;4084864418&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#With huge pages: /anon_hugepage (deleted)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/29091/smaps |egrep &lt;span style="color:#e6db74"&gt;&amp;#34;rw\-s&amp;#34;&lt;/span&gt; -A &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2aaaaac00000-2ac3a2c00000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:0e &lt;span style="color:#ae81ff"&gt;215471503&lt;/span&gt; /anon_hugepage &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;104726528&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b48dfe93000-2b48dfe94000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;88604727&lt;/span&gt; /SYSV00001000 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b48dfeab000-2b48dff22000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:12 &lt;span style="color:#ae81ff"&gt;215515747&lt;/span&gt; /dev/shm/PostgreSQL.1123685558
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;476&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Child process page tables are all copied from the parent process; parent and child processes therefore share the same page frames[^ 《深入理解Linux内核》 (Understanding the Linux Kernel)]. So whether it&amp;rsquo;s the postmaster or backend processes (any process forked from postmaster), they all map the same shared memory address in their virtual memory — their addresses and Size in smaps are equal.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Do All PG Processes Have /dev/zero as the Largest Segment in Virtual Memory?
 &lt;div id="why-do-all-pg-processes-have-devzero-as-the-largest-segment-in-virtual-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-do-all-pg-processes-have-devzero-as-the-largest-segment-in-virtual-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are two main ways to implement anonymous page mapping with mmap: one is by setting the &lt;code&gt;MAP_ANONYMOUS&lt;/code&gt; flag with &lt;code&gt;fd=-1&lt;/code&gt;, and the other is by opening the &lt;code&gt;/dev/zero&lt;/code&gt; device file and passing the resulting file descriptor to &lt;code&gt;mmap&lt;/code&gt;. These two methods are functionally equivalent.&lt;/p&gt;
&lt;p&gt;PG shared buffers use the &lt;code&gt;/dev/zero&lt;/code&gt; device mapping to implement anonymous shared pages, which is why you typically see PG processes having a large proportion of their virtual memory address space as &lt;code&gt;/dev/zero&lt;/code&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;[Understanding the Linux Kernel]: Understanding the Linux Kernel: Memory Addressing, Memory Management, Address Space Management, Page Frame Reclamation&lt;/p&gt;
&lt;p&gt;[Understanding Linux Processes and Memory]: Understanding Linux Processes and Memory: CPU Hardware Principles, Process and Thread Comparison&lt;/p&gt;
&lt;p&gt;[Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition]: Running Linux Kernel: Beginner&amp;rsquo;s Guide 2nd Edition: System Calls, Memory Management&lt;/p&gt;
&lt;hr&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&lt;a href="https://www.cs.oslomet.no/~haugerud/os/Forelesning/os7.pdf" target="_blank" rel="noreferrer"&gt;https://www.cs.oslomet.no/~haugerud/os/Forelesning/os7.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;&lt;a href="https://www.cs.unc.edu/~porter/courses/comp630/s24/slides/pfra.pdf" target="_blank" rel="noreferrer"&gt;https://www.cs.unc.edu/~porter/courses/comp630/s24/slides/pfra.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/gorman/html/understand/index.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/gorman/html/understand/index.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;&lt;a href="https://courses.cs.washington.edu/courses/cse333/20wi/lectures/07/CSE333-L07-posix_20wi.pdf" target="_blank" rel="noreferrer"&gt;https://courses.cs.washington.edu/courses/cse333/20wi/lectures/07/CSE333-L07-posix_20wi.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:5"&gt;
&lt;p&gt;&lt;a href="https://www.sohu.com/a/392831824_467784" target="_blank" rel="noreferrer"&gt;https://www.sohu.com/a/392831824_467784&lt;/a&gt;&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:6"&gt;
&lt;p&gt;&lt;a href="https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/configuring-an-operating-system-to-optimize-memory-access_monitoring-and-managing-system-status-and-performance#overview-of-a-systems-memory_configuring-an-operating-system-to-optimize-memory-access" target="_blank" rel="noreferrer"&gt;redhat,Configuringanoperatingsystemtooptimizememoryaccess&lt;/a&gt;&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:7"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#swappiness" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#swappiness&lt;/a&gt;&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:8"&gt;
&lt;p&gt;&lt;a href="https://access.redhat.com/solutions/6785021" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/solutions/6785021&lt;/a&gt;&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:9"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/Documentation/vm/overcommit-accounting" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/Documentation/vm/overcommit-accounting&lt;/a&gt;&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:10"&gt;
&lt;p&gt;&lt;a href="https://carlyleliu.github.io/LinuxKernel/LinuxMemoryOptimization/" target="_blank" rel="noreferrer"&gt;https://carlyleliu.github.io/LinuxKernel/LinuxMemoryOptimization/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:10" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:11"&gt;
&lt;p&gt;&lt;a href="https://www.man7.org/linux/man-pages/man5/proc_pid_oom_score.5.html" target="_blank" rel="noreferrer"&gt;https://www.man7.org/linux/man-pages/man5/proc_pid_oom_score.5.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:11" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:12"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/memory.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref2:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:13"&gt;
&lt;p&gt;&lt;a href="https://wiki.goframe.org/pages/viewpage.action?pageId=157646868" target="_blank" rel="noreferrer"&gt;https://wiki.goframe.org/pages/viewpage.action?pageId=157646868&lt;/a&gt;&amp;#160;&lt;a href="#fnref:13" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:14"&gt;
&lt;p&gt;&lt;a href="https://www.man7.org/conf/lca2019/cgroups_v2-LCA2019-Kerrisk.pdf" target="_blank" rel="noreferrer"&gt;https://www.man7.org/conf/lca2019/cgroups_v2-LCA2019-Kerrisk.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:14" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:15"&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:15" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:16"&gt;
&lt;p&gt;&lt;a href="https://support.huaweicloud.com/usermanual-hce/hce_02_0072.html" target="_blank" rel="noreferrer"&gt;https://support.huaweicloud.com/usermanual-hce/hce_02_0072.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:16" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:17"&gt;
&lt;p&gt;&lt;a href="https://chrisdown.name/talks/cgroupv2/cgroupv2-fosdem.pdf" target="_blank" rel="noreferrer"&gt;https://chrisdown.name/talks/cgroupv2/cgroupv2-fosdem.pdf&lt;/a&gt;&amp;#160;&lt;a href="#fnref:17" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:18"&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/muahao/p/10109712.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/muahao/p/10109712.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:18" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title>My 2024 Year-End Summary</title><link>https://lastdba.com/en/2025/01/11/my-2024-year-end-summary/</link><pubDate>Sat, 11 Jan 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/01/11/my-2024-year-end-summary/</guid><description>&lt;h2 class="relative group"&gt;As a DBA
 &lt;div id="as-a-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#as-a-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;2023 was a year of comprehensive PostgreSQL learning for me, and 2024 has been a year of comprehensive PostgreSQL operations. There&amp;rsquo;s actually a lot of material I really want to dive into but haven&amp;rsquo;t had the time. This year was mainly case analysis — I could only supplement my foundational knowledge here and there.&lt;/p&gt;
&lt;p&gt;Mid-year there was a discussion about &amp;ldquo;will DBAs be eliminated in the cloud era.&amp;rdquo; This discussion left a deep impression on me. I thought about many things afterward — why do others seem to have so few things to deal with while I, as a DBA, have so much? I even went into cloud computing groups to debate about it, and I actually gained something from it. Different perspectives lead to unexpected conclusions. The conclusion of the debate may boil down to just one thing: DBAs are providing 1510 emotional value to their leaders.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;As a DBA
 &lt;div id="as-a-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#as-a-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;2023 was a year of comprehensive PostgreSQL learning for me, and 2024 has been a year of comprehensive PostgreSQL operations. There&amp;rsquo;s actually a lot of material I really want to dive into but haven&amp;rsquo;t had the time. This year was mainly case analysis — I could only supplement my foundational knowledge here and there.&lt;/p&gt;
&lt;p&gt;Mid-year there was a discussion about &amp;ldquo;will DBAs be eliminated in the cloud era.&amp;rdquo; This discussion left a deep impression on me. I thought about many things afterward — why do others seem to have so few things to deal with while I, as a DBA, have so much? I even went into cloud computing groups to debate about it, and I actually gained something from it. Different perspectives lead to unexpected conclusions. The conclusion of the debate may boil down to just one thing: DBAs are providing 1510 emotional value to their leaders.&lt;/p&gt;
&lt;p&gt;Right or wrong, you can see reflections on the DBA profession in many of my articles this year. Continuing down the traditional DBA path is certainly a dead end. Today&amp;rsquo;s DBAs lean more toward business data layer operations, or moving up to architecture design. Positions for expert DBAs focused purely on databases are actually very few.&lt;/p&gt;

&lt;h2 class="relative group"&gt;READING
 &lt;div id="reading" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reading" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4961d50ee1c3.jpg" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Let me reiterate why I&amp;rsquo;m so devoted to reading (I said this in 2023 too&amp;hellip;):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The value brought by reading is immeasurable in the short term&lt;/li&gt;
&lt;li&gt;Reading brings a pleasant sense of intellectual enrichment&lt;/li&gt;
&lt;li&gt;Learning is a belief. Yuval Harari has a view: believing in science is actually also a form of faith. I choose to believe in this faith, at least in 2024 and the foreseeable future.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;My book list roughly falls into three categories: PostgreSQL, broader technical scope, and extracurricular. Some are in Chinese, some in English. Some are physical books, some electronic.&lt;/p&gt;
&lt;p&gt;This year, let me continue with a reading list ranking. Horizontal comparison across different categories is a bit of a stretch, so let&amp;rsquo;s compare within categories. Once again, note: these book lists are for books I aimed to &amp;ldquo;finish cover to cover.&amp;rdquo; Books used as references don&amp;rsquo;t count here.&lt;/p&gt;
&lt;p&gt;2024 PostgreSQL Book List (ranked by preference):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&amp;ldquo;PostgreSQL Database Kernel Analysis&amp;rdquo; — clear thinking and framework, though the version is a bit old&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Quickly Mastering PostgreSQL Version New Features&amp;rdquo; — this should be my favorite PostgreSQL book this year, because it has &lt;strong&gt;zero fluff&lt;/strong&gt; throughout, a pleasure to read&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Internals of PostgreSQL&amp;rdquo; — I originally wanted to put this first, but since there&amp;rsquo;s a free online version at interdb, I wouldn&amp;rsquo;t even recommend buying this book. It&amp;rsquo;s ranked here because interdb is so excellent — its substitute is enshrined here as a deity&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Way of PostgreSQL: From Apprentice to Expert, 2nd Edition&amp;rdquo; — very detailed but also very long. I recommend skimming through quickly to find the key points without lingering too long&lt;/li&gt;
&lt;li&gt;&amp;ldquo;PostgreSQL Technical Internals: Transaction Processing Deep Dive&amp;rdquo; — transactions are the foundation of PostgreSQL, and also the foundation of my source code journey&lt;/li&gt;
&lt;li&gt;&amp;ldquo;PostgreSQL in Action&amp;rdquo; — the practical examples are well worth referencing&lt;/li&gt;
&lt;li&gt;&amp;ldquo;PostgreSQL 16 Administration Cookbook&amp;rdquo; — not recommended. The table of contents framework looks good, but the content is hollow. Don&amp;rsquo;t waste time on this book.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;2024 Broader Technical Scope Book List (ranked by preference):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&amp;ldquo;DDIA-v2: Designing Data-Intensive Applications (2nd Edition)&amp;rdquo; — so good I don&amp;rsquo;t know where to begin. So excellent that I specially wrote &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/%E8%AF%BB%E4%B9%A6%E7%AC%94%E8%AE%B0/%E8%AF%BB%E4%B9%A6%E7%AC%94%E8%AE%B0%E2%80%94%E2%80%94DDIA-v2%20%E8%AE%BE%E8%AE%A1%E6%95%B0%E6%8D%AE%E5%AF%86%E9%9B%86%E5%9E%8B%E5%BA%94%E7%94%A8%EF%BC%88%E7%AC%AC%E4%BA%8C%E7%89%88%EF%BC%89.md" target="_blank" rel="noreferrer"&gt;reading notes&lt;/a&gt; (my only book notes article this year). I wish I had found it sooner.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;A Brief History of Databases&amp;rdquo; — reading history truly brings insight. The story of databases begins here. Some technical things become clearer in hindsight.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;ITIL 4 and DevOps Service Management Certification Guide (2nd Edition)&amp;rdquo; — a classic in IT service management. It elevated my understanding of the operations role — how did these things so closely tied to my work come about? Which parts don&amp;rsquo;t match reality, and why weren&amp;rsquo;t they applied? You can grasp many things from it.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Cloud Native Kubernetes&amp;rdquo; — hardcore, another track entirely&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Docker Deep Dive&amp;rdquo; — decent for understanding containers and container history. The container knowledge itself isn&amp;rsquo;t actually that much.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Brother Bird&amp;rsquo;s Linux Private Kitchen&amp;rdquo; — sorry, I genuinely hadn&amp;rsquo;t read this classic. Came to catch up. The writing approach is well worth learning from. The drawback is that much of it isn&amp;rsquo;t useful for my role.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Machine Learning&amp;rdquo; — ranked here not because the book is bad, but because it&amp;rsquo;s very hard to understand. I gave up about a quarter of the way through. This book showed me the upper limits of my intelligence, and I&amp;rsquo;m sad about it.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Building a Vector Database from Scratch&amp;rdquo; — if you want to read source code, go to GitHub&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Deep Understanding of Go Language&amp;rdquo; — understood nothing at all&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;2024 Extracurricular Book List (ranked by preference):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&amp;ldquo;Cancer Ward&amp;rdquo; — I finished this early in the first half of the year. While reading it, I felt: barring surprises, this book would rank first this year. Nobel Prize in Literature, well-deserved.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Intimate Relationships&amp;rdquo; — understanding relationships with lovers, friends, and bosses. Academic paper style, solid, I like it.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Does God Play Dice? A History of Quantum Physics&amp;rdquo; — setting aside everything else, the writing style provides immense emotional value, making me want to keep reading. I finished it in just a few days.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Worlds I See&amp;rdquo; — AI pioneer Fei-Fei Li&amp;rsquo;s autobiography. The story of a girl who grew up in Chengdu venturing into the melting pot of America, eventually leading Google AI, while also narrating the history of AI development.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;21 Lessons for the 21st Century&amp;rdquo; — the final installment of Yuval Harari&amp;rsquo;s trilogy. I loved the first two books, but this one felt just okay. At least it brought closure.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Old Man and the Sea&amp;rdquo; — hard to evaluate. I like its temperament, but not its content.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Wandering Earth&amp;rdquo; — this is a collection of Liu Cixin&amp;rsquo;s short stories. One day at the library, I bought it because of the first short story. After buying it, I found the other short stories to be very boring and childish. I felt cheated.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Journey to the West&amp;rdquo; — hot take: they can&amp;rsquo;t even explain Tang Sanzang&amp;rsquo;s background properly. A mess, completely confused. I gave up after a little bit. (My evaluation of &amp;ldquo;Romance of the Three Kingdoms&amp;rdquo; last year was very high.)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Blog and WeChat Official Account
 &lt;div id="blog-and-wechat-official-account" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#blog-and-wechat-official-account" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;2024 Published Articles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL technical: 21&lt;/li&gt;
&lt;li&gt;Other technical: 2&lt;/li&gt;
&lt;li&gt;Book notes: 1&lt;/li&gt;
&lt;li&gt;Useless articles: 1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I only wrote 25 articles this year, a noticeable decrease from last year.&lt;/p&gt;
&lt;p&gt;WeChat Official Account followers: 600. Though not many, I believe every single one has good taste &amp;#x1f638;&lt;/p&gt;
&lt;p&gt;Writing technical articles is actually quite tiring — it takes far more time than one would imagine. However, you genuinely learn things during the writing process, and the sense of accomplishment from completing a piece is real. Since I feel responsible for my articles, I won&amp;rsquo;t write recklessly about things I don&amp;rsquo;t understand. As for errors arising from misunderstandings, that&amp;rsquo;s actually normal. No one can guarantee that their future self won&amp;rsquo;t criticize their current self — just write correctly for the current state.&lt;/p&gt;
&lt;p&gt;In terms of writing content this year, I gave up writing reading notes for extracurricular books. I wrote quite a few last year, but writing reading notes takes a lot of time with very low value. Low emotional value tasks naturally get abandoned. In fact, my writing content varies each year. Currently, PostgreSQL database technical articles are the only constant — other types aren&amp;rsquo;t as stable. This is normal. The blog was originally meant for database writing. If there&amp;rsquo;s no application scenario for other domains, I won&amp;rsquo;t touch them again after the brief exploratory period.&lt;/p&gt;
&lt;p&gt;One more complaint: domestic blogging platforms only care about article quantity, which is completely at odds with my writing style. Each of my articles is tens of thousands of hand-typed characters. I&amp;rsquo;m a quality-over-quantity blogger. So I can&amp;rsquo;t be bothered anymore — I&amp;rsquo;m planning to abandon CSDN in 2025 and just post on GitHub and my WeChat Official Account.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve been writing on CSDN since 2017. When I first started blogging, there weren&amp;rsquo;t many good blog hosting platforms. Looking at CSDN now: community interaction is zero, and the vast majority of articles on it are terrible. Even I don&amp;rsquo;t want to find CSDN articles myself. It&amp;rsquo;s like a first love of 7-8 years — sometimes you just have to break up.&lt;/p&gt;
&lt;p&gt;2024 Publication Channels:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CSDN Blog: &lt;a href="https://liuzhilong.blog.csdn.net/" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Modb.pro: liuzhilong62&lt;/li&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/liuzhilong62/blogs" target="_blank" rel="noreferrer"&gt;https://github.com/liuzhilong62/blogs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;WeChat Official Account: 破斯特贵斯库儿&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Expected 2025 Channels:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/liuzhilong62/blogs" target="_blank" rel="noreferrer"&gt;https://github.com/liuzhilong62/blogs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;WeChat Official Account: 破斯特贵斯库儿&lt;/li&gt;
&lt;li&gt;Other platforms: we&amp;rsquo;ll see&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Final Thoughts
 &lt;div id="final-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#final-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I seem to talk about work-learning balance every year&amp;hellip; Due to a dramatic increase in workload this year, there was even a period where I couldn&amp;rsquo;t study at all. Balance has been shattered. Not having time to study is unacceptable to me, so I later adjusted my daily schedule (thanks to &amp;ldquo;Atomic Habits&amp;rdquo; — I absolutely love this book), and finally managed to squeeze out some study time. Actually, as long as no one&amp;rsquo;s around, learning efficiency is high.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve collected some quotes I resonated with this year:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t let others become dependencies in your task chain &amp;ndash;heisenberg.liu&lt;/li&gt;
&lt;li&gt;Plans that require execution are generally simple plans &amp;ndash;heisenberg.liu&lt;/li&gt;
&lt;li&gt;Things not implemented equal things not done &amp;ndash;somebody&lt;/li&gt;
&lt;li&gt;Solve problems yourself instead of waiting for others to reply &amp;ndash;somebody&lt;/li&gt;
&lt;li&gt;Important things should be done immediately — waiting even a moment means they won&amp;rsquo;t get done &amp;ndash;somebody&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t do repetitive low-value tasks. Think more about the context behind this requirement &amp;ndash;heisenberg.liu&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t pan for gold in shit. Find ways to get quality information sources &amp;ndash;somebody&lt;/li&gt;
&lt;li&gt;SREs need the ability to configure optimal default parameters and the ability to modify these parameters in bulk &amp;ndash;&amp;ldquo;Enterprise Cloud Computing&amp;rdquo;&lt;/li&gt;
&lt;li&gt;The more miscellaneous tasks you do, the more miscellaneous tasks come your way &amp;ndash;heisenberg.liu&lt;/li&gt;
&lt;li&gt;SREs spend 50% of time on operations and 50% on development &amp;ndash;&amp;ldquo;Enterprise Cloud Computing&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Premature optimization is the root of all evil. Premature code abstraction is also the root of all evil &amp;ndash;somebody&lt;/li&gt;
&lt;li&gt;The speed at which the human brain receives knowledge is limited &amp;ndash;somebody&lt;/li&gt;
&lt;li&gt;If someone won&amp;rsquo;t let you read, leave that person or leave that environment &amp;ndash;heisenberg.liu&lt;/li&gt;
&lt;li&gt;Teams that build knowledge bases are slackers &amp;ndash;somebody&lt;/li&gt;
&lt;li&gt;The value of a standard is determined by the customer &amp;ndash;&amp;ldquo;ITIL 4&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Heroism: working long hours and troubleshooting alone. Long working hours also lead to burnout with the work itself. Those who want to be heroes are only interested in their own achievements and turn a deaf ear to team collaboration &amp;ndash;&amp;ldquo;ITIL 4&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Not all problems need root cause analysis. It depends on the frequency of occurrence and the scope of the failure &amp;ndash;&amp;ldquo;ITIL 4&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Looking back at the plans I set for myself in 2023: only 2 items total, and I completed neither. KPI achievement rate: &lt;strong&gt;0%&lt;/strong&gt; &amp;#x1f604;&lt;/p&gt;
&lt;p&gt;Combining agile operations, agile project management, and OKR thinking: setting a full-year plan for myself at the beginning of the year is simply unreasonable. Looking back at last year and the year before, some of my plans emerged mid-way and won priority battles over other tasks. And some tasks simply couldn&amp;rsquo;t be completed — this should be a normal state. So, I won&amp;rsquo;t set too many flags for myself.&lt;/p&gt;
&lt;p&gt;2025 Plan:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Continue some things&lt;/li&gt;
&lt;li&gt;Think about how to produce output&lt;/li&gt;
&lt;li&gt;Master another track&lt;/li&gt;
&lt;li&gt;PostgreSQL&amp;hellip; haven&amp;rsquo;t figured out what more to do&lt;/li&gt;
&lt;li&gt;Find a way to resume fitness&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>PostgreSQL Ops Experience 2024</title><link>https://lastdba.com/en/2025/01/08/postgresql-ops-experience-2024/</link><pubDate>Wed, 08 Jan 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/01/08/postgresql-ops-experience-2024/</guid><description>&lt;p&gt;This article focuses on common PostgreSQL operations issues — rare edge cases that surface once every two or three years are out of scope.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s primarily a technical ops summary, aiming for clarity and quick applicability. Deep dives at the source-code level are deliberately avoided.&lt;/p&gt;

&lt;h2 class="relative group"&gt;SQL Performance &amp;amp; Execution Plans
 &lt;div id="sql-performance--execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-performance--execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sudden Execution Plan Changes
 &lt;div id="sudden-execution-plan-changes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sudden-execution-plan-changes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL does not support optimizer hints natively, and the community has made it clear it never will.
The PG community&amp;rsquo;s stance is roughly: &amp;ldquo;Our optimizer is perfect. If the current plan isn&amp;rsquo;t good enough, it&amp;rsquo;s because the developer doesn&amp;rsquo;t understand optimization.&amp;rdquo;&lt;/p&gt;</description><content:encoded>&lt;p&gt;This article focuses on common PostgreSQL operations issues — rare edge cases that surface once every two or three years are out of scope.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s primarily a technical ops summary, aiming for clarity and quick applicability. Deep dives at the source-code level are deliberately avoided.&lt;/p&gt;

&lt;h2 class="relative group"&gt;SQL Performance &amp;amp; Execution Plans
 &lt;div id="sql-performance--execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-performance--execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Sudden Execution Plan Changes
 &lt;div id="sudden-execution-plan-changes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sudden-execution-plan-changes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL does not support optimizer hints natively, and the community has made it clear it never will.
The PG community&amp;rsquo;s stance is roughly: &amp;ldquo;Our optimizer is perfect. If the current plan isn&amp;rsquo;t good enough, it&amp;rsquo;s because the developer doesn&amp;rsquo;t understand optimization.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Regardless of what the PG community thinks, sudden execution plan regressions happen all the time in production, and we don&amp;rsquo;t have the rich, native plan-binding mechanisms that Oracle provides. This is a real challenge for production operations. For example: one morning, a sensitive query suddenly changes its plan, runtime jumps from 0.1s to 1s, and due to some concurrency the database CPU gets hammered — the business notices immediately. Without plan-binding tools, our only two rapid recovery options are: 1) collect statistics, or 2) scale up CPU.&lt;/p&gt;
&lt;p&gt;A question about rapid recovery: does collecting statistics always help? A good DBA can identify where the optimizer went wrong, but can&amp;rsquo;t instantly conjure up a complete correct plan — especially for complex queries. Collecting statistics essentially hands the optimization problem back to the optimizer, trusting it to get it right. While this sounds a bit shaky, in PostgreSQL it actually works most of the time. (For scenarios where collecting stats is known to be useless, see the &amp;ldquo;ORDER BY LIMIT Problem&amp;rdquo; section.)&lt;/p&gt;
&lt;p&gt;Why do execution plans suddenly change and regress?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Plans are cost-based, costs rely on statistics, and statistics are always lagging&lt;/li&gt;
&lt;li&gt;Sufficiently complex SQL has a huge number of possible execution paths, and the optimizer picks the lowest-cost one&lt;/li&gt;
&lt;li&gt;PG exposes many optimizer parameters to tune for local hardware (e.g., &lt;code&gt;seq_page_cost&lt;/code&gt;, &lt;code&gt;effective_cache_size&lt;/code&gt;). These can nudge the optimizer&amp;rsquo;s preferences but are very low-level. While there&amp;rsquo;s theoretical tuning headroom, changing them has system-wide effects. After go-live, adjusting these is extremely high-risk. The very existence of these parameters hints that no plan can be 100% perfect, because the optimizer&amp;rsquo;s reasoning depends on its environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Even mighty Oracle, with its arsenal of plan-stabilization features, can&amp;rsquo;t guarantee 100% problem-free SQL — because SQL, data, statistics, bind variables, etc. are all dynamic.&lt;/p&gt;
&lt;p&gt;For PG users, we&amp;rsquo;re not there yet, but we can work on making plans more stable:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Don&amp;rsquo;t join too many tables. More tables mean more possible plans — to the point where &lt;a href="https://www.postgresql.org/docs/16/geqo-pg-intro.html" target="_blank" rel="noreferrer"&gt;PG GEQO&lt;/a&gt; stops enumerating all plans, reducing the chance of finding the optimal one&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t write overly complex SQL. Keep in mind SQL may come from ORM frameworks rather than hand-written queries. Framework-generated SQL is often optimized for a goal with little regard for brevity or readability, making it very hard to tune&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t create indexes indiscriminately — have a clear goal. Random indexes confuse the optimizer&lt;/li&gt;
&lt;li&gt;Tune per-table statistics collection thresholds via &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; (see &amp;ldquo;Delayed Statistics Collection&amp;rdquo;)&lt;/li&gt;
&lt;li&gt;Use pg_hint_plan to give the optimizer hints&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;pg_hint_plan
 &lt;div id="pg_hint_plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_hint_plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/ossc-db/pg_hint_plan" target="_blank" rel="noreferrer"&gt;pg_hint_plan&lt;/a&gt; is a third-party extension that uses hints to guide the optimizer toward the correct plan.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What pg_hint_plan supports:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Specifying scan methods (e.g., index scan), join methods (NL/HASH/MERGE), join order, memoize, estimated row counts, parallelism, and GUC parameters&lt;/li&gt;
&lt;li&gt;Binding hints to SQL via &lt;code&gt;hint_plan.hints&lt;/code&gt; without modifying the application SQL text&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pg_hint_plan limitations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Usage restrictions with subqueries, foreign tables, CTEs, views, PL/pgSQL, etc.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;compute_query_id&lt;/code&gt; treats hints as comments and ignores them&lt;/li&gt;
&lt;li&gt;Unknown bugs&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While this extension is actively maintained, I haven&amp;rsquo;t found large-scale production deployment cases yet. We&amp;rsquo;ve also encountered issues in limited production use where hints don&amp;rsquo;t take effect — possibly related to JDBC plan caching — but it&amp;rsquo;s hard to draw firm conclusions.&lt;/p&gt;
&lt;p&gt;In short: pg_hint_plan is a good tool, but large-scale production deployment is still TBD. I recommend waiting and watching. You can trial it, but don&amp;rsquo;t become dependent on it.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Delayed Statistics Collection
 &lt;div id="delayed-statistics-collection" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#delayed-statistics-collection" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Statistics are the foundation of SQL optimization. PG statistics aren&amp;rsquo;t particularly complex, but many people still don&amp;rsquo;t fully understand them.&lt;/p&gt;
&lt;p&gt;The three key views for PG statistics: &lt;code&gt;pg_class&lt;/code&gt;, &lt;code&gt;pg_stat_all_tables&lt;/code&gt;, &lt;code&gt;pg_stats&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_class: pages and tuples
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relpages,reltuples::bigint &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpg&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;187501&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltuples &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6000032&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_stat_all_tables: live tuples, dead tuples, last analyze time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,n_live_tup,n_dead_tup,last_analyze,last_autoanalyze &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_all_tables &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpg&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6000032&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_analyze &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2025&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;553057&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_autoanalyze &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_stats: per-column statistics — understand every field
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stats &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; tablename&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpg&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; attname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schemaname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tablename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;attname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inherited &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;null_frac &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;avg_width &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_distinct &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_vals &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_freqs &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;histogram_bounds &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;correlation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elems &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elem_freqs &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;elem_count_histogram &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Stale statistics are very likely to cause execution plan changes and SQL performance issues.
Check &lt;code&gt;last_autovacuum&lt;/code&gt; and &lt;code&gt;last_autoanalyze&lt;/code&gt; in &lt;code&gt;pg_stat_all_tables&lt;/code&gt; to determine if collection is lagging.&lt;/p&gt;
&lt;p&gt;Why tune it? Because the default &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; is 0.1, meaning statistics are only collected when data changes exceed 10%. For a 1-billion-row table, that&amp;rsquo;s 100 million rows — possibly far too infrequent.&lt;/p&gt;
&lt;p&gt;Evaluate whether to tune per-table &lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt; and &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; based on: whether it&amp;rsquo;s a core business table, number of joins, query complexity, access frequency, month-boundary issues, data skew, etc. The goal: increase collection frequency to reduce plan-regression risk without wasting resources on excessive vacuuming.&lt;/p&gt;
&lt;p&gt;What value should you set? An example:&lt;/p&gt;
&lt;p&gt;For a monthly table (or monthly partition) with queries hitting the current day&amp;rsquo;s data: with &lt;code&gt;autovacuum_analyze_scale_factor = 0.1&lt;/code&gt;, the table gets analyzed almost daily for the first ~10 days, but may skip analysis around day 12. At that point statistics can cross a boundary and plans may degrade. To ensure analysis continues through days 10–31 of the month, set &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; below &lt;code&gt;0.03&lt;/code&gt;. I recommend &lt;code&gt;autovacuum_analyze_scale_factor = 0.02&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Parameter tuning reference (consider your table&amp;rsquo;s data model!):&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Recommended&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.2&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.04&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;0.02&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;The Optimizer May Choose a Non-Primary-Key Index
 &lt;div id="the-optimizer-may-choose-a-non-primary-key-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-optimizer-may-choose-a-non-primary-key-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Intuitively, a primary key should have the best selectivity, but the optimizer may still choose something else.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Reproduction commands
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1(a char(&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxa &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxb &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1(b);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; t1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Columns a and b have identical selectivity, but the optimizer picks the regular index, not the PK
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxb &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;045&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;046&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (b &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Force the PK path — cost is only marginally higher
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxa &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)&lt;span style="color:#f92672"&gt;`&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((b)::text &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared &lt;span style="color:#66d9ef"&gt;read&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Even though columns a and b have the same type and selectivity, the optimizer picks the regular index over the PK. The PK path costs 0.01 more.&lt;/p&gt;
&lt;p&gt;Why does this matter?&lt;/p&gt;
&lt;p&gt;With the current data distribution, picking the regular index is harmless. But once data changes, the two index plans can diverge dramatically:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (autovacuum_enabled &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;off&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),&lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;20001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;30000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- b=&amp;#39;repeat&amp;#39; has terrible selectivity, but the b index is still chosen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxb &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;823&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;824&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (b &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2511&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Compare with the PK plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; b&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxa &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2008&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;::bpchar)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((b)::text &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;repeat&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Even with poor real selectivity, the optimizer sticks with the regular index — but efficiency is far worse (shared hit=2511 vs. shared hit=3). For latency-sensitive queries or larger data volumes, this becomes a real production problem.&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Manually collect statistics; increase collection frequency&lt;/li&gt;
&lt;li&gt;Use pg_hint_plan&lt;/li&gt;
&lt;li&gt;Rewrite the SQL to prevent it from using the regular index&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;The ORDER BY LIMIT Problem
 &lt;div id="the-order-by-limit-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-order-by-limit-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ORDER BY with LIMIT is a well-known issue with plenty of write-ups and case studies online (see my post &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/ORDER%20BY%20limit%2010%E6%AF%94ORDER%20BY%20limit%20100%E6%9B%B4%E6%85%A2.md" target="_blank" rel="noreferrer"&gt;ORDER BY LIMIT 10 Is Slower Than ORDER BY LIMIT 100&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The root cause: the optimizer currently can&amp;rsquo;t estimate where data sits in the table relative to the index order. If matching rows happen to be near the end of the table, the scan reads far more data than expected before returning the LIMIT rows. Note this isn&amp;rsquo;t limited to ORDER BY + LIMIT — any operation involving sorted output + LIMIT can hit it: GROUP BY + LIMIT, DISTINCT + LIMIT, merge joins, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Solutions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rewrite the SQL: add an expression to prevent using the sort-column index (including PK), e.g., &lt;code&gt;order by ''||col1 limit xxx&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Create a composite index: a composite index on (sort_column + index_column) may be chosen by the optimizer and is generally more efficient than an index on the sort column alone. This approach doesn&amp;rsquo;t require changing the SQL&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Table Bloat
 &lt;div id="table-bloat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#table-bloat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Something Blocking Dead Tuple Cleanup
 &lt;div id="something-blocking-dead-tuple-cleanup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#something-blocking-dead-tuple-cleanup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Putting aside autovacuum configuration issues and edge cases, the common blockers are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Long-running transactions. Note: a long transaction on a &lt;em&gt;different&lt;/em&gt; table also blocks dead-tuple reclamation. Read-only queries cause this too.&lt;/li&gt;
&lt;li&gt;Replication slots. Lagging or defunct replication slots cause this.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both are relatively easy to solve: 1) terminate the long-transaction session, 2) drop the replication slot, or have the consumer analyze why consumption is so slow.&lt;/p&gt;

&lt;h3 class="relative group"&gt;High-Concurrency UPDATE Causing Table Bloat
 &lt;div id="high-concurrency-update-causing-table-bloat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#high-concurrency-update-causing-table-bloat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Unlike something blocking vacuum, this is about dead tuples being generated faster than vacuum can clean them up. Typically, such tables show high &lt;code&gt;pg_stat_all_tables.n_tup_upd&lt;/code&gt;. If table bloat requires repack, assess whether write volume is high enough to make repeated manual repack a losing game. In that case, tune the table/index &lt;code&gt;fillfactor&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;For the underlying principles, see this post &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%BA%90%E7%A0%81%E8%A7%A3%E6%9E%90/%E4%BB%8E%E5%BE%88%E6%85%A2%E7%9A%84%E5%94%AF%E4%B8%80%E7%B4%A2%E5%BC%95%E6%89%AB%E6%8F%8F%E5%88%B0%E7%B4%A2%E5%BC%95%E8%86%A8%E8%83%80.md" target="_blank" rel="noreferrer"&gt;From Painfully Slow Unique Index Scans to Index Bloat&lt;/a&gt;. I&amp;rsquo;ll summarize the conclusions here:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fillfactor basics:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;fillfactor acts as a high-water mark for tables or indexes. During INSERT, once a page reaches its fillfactor line, new rows go to the next page. The purpose is to reserve space for UPDATEs so they don&amp;rsquo;t constantly seek new pages.&lt;/p&gt;
&lt;p&gt;While both tables and indexes have fillfactor with the same goal (accommodating UPDATEs), the details differ significantly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tables: If a page still has free space, an UPDATE can stay within the same page — no new page needed, no need to find another page with space. More importantly, thanks to PG&amp;rsquo;s HOT (Heap-Only Tuple) feature, in-page updates don&amp;rsquo;t touch indexes, naturally slowing index bloat&lt;/li&gt;
&lt;li&gt;Indexes: Different rows or out-of-page updates of the same row generate new index entries. Reserving space in index pages via fillfactor greatly reduces index page splits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, fillfactor settings are tightly coupled with the workload. If data is append-only like logs with zero updates, fillfactor=100 for both tables and indexes is perfectly fine. But most business tables see updates, so fillfactor shouldn&amp;rsquo;t be 100. With frequent UPDATEs, it should be even lower.&lt;/p&gt;
&lt;p&gt;Yet PG&amp;rsquo;s defaults are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Table default: fillfactor=100&lt;/li&gt;
&lt;li&gt;Index default: fillfactor=90&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Recommended settings:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpg &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (fillfactor&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; lzlpg_pkey &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (fillfactor&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- These commands only affect new pages; existing pages need repack
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Repack:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;. &lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; long transactions; resolve them &lt;span style="color:#66d9ef"&gt;first&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;. nohup pg_repack &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#75715e"&gt;--table lzlpg -p 6666 -no-kill-backend &amp;gt; pgrepack_lzlpg_log.log 2&amp;gt;&amp;amp;1 &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Long Transaction Problems
 &lt;div id="long-transaction-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#long-transaction-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Long transactions don&amp;rsquo;t have a huge amount of theory behind them — monitor and handle promptly — but they absolutely deserve their own section.&lt;/p&gt;
&lt;p&gt;Long transactions cause many problems:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unreleased locks → application blocking&lt;/li&gt;
&lt;li&gt;WAL not recycled → disk alerts&lt;/li&gt;
&lt;li&gt;Dead tuples not cleaned → SQL performance degradation&lt;/li&gt;
&lt;li&gt;Various other bizarre performance issues linked to long transactions&lt;/li&gt;
&lt;li&gt;&amp;hellip;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Long transactions in PostgreSQL are far more damaging than in Oracle or MySQL. They must be strictly managed.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Subtransaction Problems
 &lt;div id="subtransaction-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransaction-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;&amp;ldquo;Subtransactions are basically cursed. Rip em out.&amp;rdquo;&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Subtransactions cause many problems and are a frequent pain point in the industry.&lt;/p&gt;
&lt;p&gt;Industry experience reports:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pganalyze.com/blog/5mins-postgres-17-configurable-slru-cache" target="_blank" rel="noreferrer"&gt;Waiting for Postgres 17: Configurable SLRU cache sizes for increased performance&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://knowledge.enterprisedb.com/hc/en-us/articles/13523268146972-Subtransactions-overflow-and-the-performance-cliff" target="_blank" rel="noreferrer"&gt;Subtransactions-overflow-and-the-performance-cliff&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;Why we spent the last month eliminating PostgreSQL subtransactions&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Where subtransactions come from:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;PL/pgSQL&lt;/code&gt; functions containing a block with an &lt;strong&gt;exception&lt;/strong&gt; clause&lt;/li&gt;
&lt;li&gt;&lt;code&gt;savepoints&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;JDBC + &lt;a href="https://jdbc.postgresql.org/documentation/use/" target="_blank" rel="noreferrer"&gt;autosave=always&lt;/a&gt; (default &lt;code&gt;autosave=never&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;ODBC&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: OGG uses an ODBC driver, and ODBC cannot disable subtransactions.&lt;/p&gt;
&lt;p&gt;GaussDB&amp;rsquo;s ODBC can disable subtransactions via &lt;a href="https://support.huaweicloud.com/intl/en-us/centralized-devg-v8-gaussdb/gaussdb-42-0098.html" target="_blank" rel="noreferrer"&gt;ForExtensionConnector&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;So we can advise applications to keep subtransactions under 64, but we can&amp;rsquo;t easily advise against using OGG, since migrating off Oracle often depends on OGG-based data sync tools.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction problem scenarios and symptoms:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1(+) long transaction + subtransaction overflow + high concurrency → severe performance drop&lt;/li&gt;
&lt;li&gt;Subtransaction overflow (64+) → noticeable performance dip&lt;/li&gt;
&lt;li&gt;Subtransaction overflow (64+) + multixact → severe performance drop&lt;/li&gt;
&lt;li&gt;1(+) long transaction + 1(+) subtransaction → severe query performance drop on read replicas&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Improvements in PG17:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;SLRU manages transaction relationships for clog, multixact, subtrans, etc. in shared memory. Relevant source definitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Number of SLRU buffers to use for subtrans */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_SUBTRANS_BUFFERS	32 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 32 SLRU pages in shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Each backend advertises up to PGPROC_MAX_CACHED_SUBXIDS TransactionIds
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * for non-aborted subtransactions of its current top transaction. These
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * have to be treated as running XIDs by other backends.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We also keep track of whether the cache overflowed (ie, the transaction has
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * generated at least one subtransaction that didn&amp;#39;t fit in the cache).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * If none of the caches have overflowed, we can assume that an XID that&amp;#39;s not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * listed anywhere in the PGPROC array is not a running transaction. Else we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * have to look at pg_subtrans.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define PGPROC_MAX_CACHED_SUBXIDS 64	&lt;/span&gt;&lt;span style="color:#75715e"&gt;// Overflow at 64+, per backend
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG17 SLRU improvements:
New GUC parameter to configure SLRU slot count; split the existing single centralized SLRU lock into multiple bank locks.&lt;/p&gt;
&lt;p&gt;Improvement effect:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d712858437e4.png" alt="image.png" /&gt;
(&lt;a href="https://www.pgevents.ca/events/pgconfdev2024/sessions/session/53/slides/27/SLRU%20Performance%20Issues.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgevents.ca/events/pgconfdev2024/sessions/session/53/slides/27/SLRU%20Performance%20Issues.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction handling strategies:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dev standards: Don&amp;rsquo;t use &lt;code&gt;savepoints&lt;/code&gt;; consider &lt;code&gt;ON CONFLICT&lt;/code&gt; for write conflicts&lt;/li&gt;
&lt;li&gt;Dev standards: Don&amp;rsquo;t use &lt;code&gt;exception&lt;/code&gt; blocks&lt;/li&gt;
&lt;li&gt;Dev standards: Ensure JDBC does &lt;em&gt;not&lt;/em&gt; have &lt;code&gt;autosave=always&lt;/code&gt; enabled&lt;/li&gt;
&lt;li&gt;Monitoring: Targeted monitoring of &lt;code&gt;pg_stat_slru&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Monitoring: Targeted monitoring of &lt;code&gt;SAVEPOINT&lt;/code&gt; and &lt;code&gt;EXCEPTION&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;CDC standards: Use ODBC (and OGG or other ODBC-based tools) with care; split transactions, cap subtransactions per large transaction at 50K&lt;/li&gt;
&lt;li&gt;Upgrade: Move to PG17&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Concurrency &amp;amp; Performance
 &lt;div id="concurrency--performance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concurrency--performance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Snapshot and Concurrency Parameter Tuning
 &lt;div id="snapshot-and-concurrency-parameter-tuning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-and-concurrency-parameter-tuning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Type&lt;/th&gt;
 &lt;th&gt;Default&lt;/th&gt;
 &lt;th&gt;Recommended&lt;/th&gt;
 &lt;th&gt;Requires Restart&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;old_snapshot_threshold&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;cpu&lt;/td&gt;
 &lt;td&gt;-1 (community)&lt;/td&gt;
 &lt;td&gt;-1&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;max_parallel_workers_per_gather&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;cpu&lt;/td&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;0&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;code&gt;old_snapshot_threshold&lt;/code&gt; easily causes performance problems when enabled — there&amp;rsquo;s plenty of material online. Even though it requires a restart, I strongly recommend keeping it disabled.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_parallel_workers_per_gather&lt;/code&gt; auto-enables parallelism for large queries, but parallelism of 2 won&amp;rsquo;t give a proportional 2x speedup. This parameter is best used in specific scenarios, like explicitly setting parallel workers for batch jobs. Since no restart is needed, it&amp;rsquo;s a quick change.&lt;/p&gt;
&lt;p&gt;Will disabling &lt;code&gt;old_snapshot_threshold&lt;/code&gt; cause problems?&lt;/p&gt;
&lt;p&gt;No. This parameter exists to limit long transactions — which do damage performance in PG — but the parameter itself causes performance issues, defeating the purpose.&lt;/p&gt;
&lt;p&gt;Long transactions can be handled via several mechanisms:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Long transaction monitoring. This is the most important, and monitoring is fairly mature.&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;statement_timeout&lt;/code&gt; (default 0)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;transaction_timeout&lt;/code&gt; (default 0, available in PG17+)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;lock_timeout&lt;/code&gt; (default 0; recommended at session level for DDL)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;idle_in_transaction_session_timeout&lt;/code&gt; (default 0; we set it to 2h)&lt;/li&gt;
&lt;li&gt;Set &lt;code&gt;idle_session_timeout&lt;/code&gt; (default 0; not relevant here)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;High-Concurrency Commits Causing LWLOCK:WALWrite
 &lt;div id="high-concurrency-commits-causing-lwlockwalwrite" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#high-concurrency-commits-causing-lwlockwalwrite" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/%E6%A1%88%E4%BE%8B-insert%20value%E5%81%B6%E5%8F%91%E6%85%A2%E5%88%86%E6%9E%90.md" target="_blank" rel="noreferrer"&gt;Case Study: Intermittent Slow INSERT &amp;hellip; VALUES&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;There&amp;rsquo;s only one IO:WALWrite, but there can be dozens of LWLOCK:WALWrite waiters&lt;/li&gt;
&lt;li&gt;You can&amp;rsquo;t directly see the LWLOCK blocking chain, but from the source code we know LWLOCK:WALWrite is waiting on IO:WALWrite&lt;/li&gt;
&lt;li&gt;In high-concurrency small-transaction scenarios, increasing WAL buffer size theoretically doesn&amp;rsquo;t help much&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What problems does this cause?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Concurrent writes block, write latency increases, active sessions may spike&lt;/li&gt;
&lt;li&gt;High-concurrency small transactions can&amp;rsquo;t saturate disk IO&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distribute concurrent writes across time&lt;/li&gt;
&lt;li&gt;Batch commits at the application level&lt;/li&gt;
&lt;li&gt;Analyze and try to reduce FPI (see FPI section)&lt;/li&gt;
&lt;li&gt;Group commit (&lt;a href="https://www.postgresql.org/docs/17/runtime-config-wal.html#GUC-COMMIT-DELAY" target="_blank" rel="noreferrer"&gt;TBD&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;WAL &amp;amp; Latency
 &lt;div id="wal--latency" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal--latency" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;FPI and Checkpoint Parameters
 &lt;div id="fpi-and-checkpoint-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fpi-and-checkpoint-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG generates WAL FPI (Full Page Images) the first time a page is touched after a checkpoint. So more frequent checkpoints → higher probability of FPI.&lt;/p&gt;
&lt;p&gt;Checkpoint frequency is controlled by two parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;checkpoint_timeout&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_wal_size&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Principle:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0197be136174.png" alt="image.png" /&gt;
(Egor Rogov, PostgreSQL 14 Internals)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_wal_size&lt;/code&gt; defaults to 1GB, which is too small for high-load databases. Generally, you should increase this parameter to reduce FPI.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;checkpoint_timeout&lt;/code&gt; defaults to 5 minutes, which seems reasonable.&lt;/p&gt;

&lt;h3 class="relative group"&gt;FPI and Random Writes
 &lt;div id="fpi-and-random-writes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fpi-and-random-writes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Even with longer checkpoint intervals, FPI problems may persist. Check whether the workload involves UUID-based random writes. You may need to switch to sequences or another UUID scheme.&lt;/p&gt;
&lt;p&gt;Finding the specific index:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check if FPI is severe&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;--stats=record&lt;/code&gt; is handy&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump -z --stats&lt;span style="color:#f92672"&gt;=&lt;/span&gt;record 00000001000001860000001B&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Sort which relations have the most FPWs&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump 00000001000001860000001B|grep FPW|awk -F &lt;span style="color:#e6db74"&gt;&amp;#39;:&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;{print $7}&amp;#39;&lt;/span&gt;|awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;|sort -n|uniq -c |sort -r|head -10&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Logical Replication &amp;amp; Replication Slots
 &lt;div id="logical-replication--replication-slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication--replication-slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Logical replication has many issues and is a key optimization area for the community — nearly every major version brings significant improvements.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E9%80%BB%E8%BE%91%E5%A4%8D%E5%88%B6.md" target="_blank" rel="noreferrer"&gt;Logical Replication and Replication Slots Basics&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Spill Problem
 &lt;div id="spill-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/PG%E8%B5%B7%E5%BA%93%E9%80%BB%E8%BE%91%E5%92%8Cspill%E5%AF%BC%E8%87%B4%E8%B5%B7%E5%BA%93%E6%85%A2%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90.md" target="_blank" rel="noreferrer"&gt;Analysis of PG Startup Logic and Spill-Induced Slow Startup&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Spill key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Spill occurs when logical decoding can&amp;rsquo;t fit transaction data in memory, so it writes to disk. Spill files contain transaction information&lt;/li&gt;
&lt;li&gt;Each walsender has independent decoding, so each logical replication subscriber has its own spill&lt;/li&gt;
&lt;li&gt;Large transactions produce large spill files, typically few in number&lt;/li&gt;
&lt;li&gt;Subtransaction spill produces one spill file per subtransaction&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Versions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG12 and earlier: hard-coded 4096 changes&lt;/li&gt;
&lt;li&gt;PG13 added &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; to adjust memory and reduce spill probability&lt;/li&gt;
&lt;li&gt;PG14+ supports streaming replication&lt;/li&gt;
&lt;li&gt;Streaming also requires certain conditions to trigger, so even with streaming, spilling can still occur&lt;/li&gt;
&lt;li&gt;PG17 added &lt;code&gt;debug_logical_replication_streaming&lt;/code&gt; to force streaming&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;WALSender Blocking Shutdown
 &lt;div id="walsender-blocking-shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-blocking-shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/PG%E5%81%9C%E5%BA%93%E9%80%BB%E8%BE%91%E5%92%8Cwalsender%E9%98%BB%E6%AD%A2%E5%81%9C%E5%BA%93%E9%97%AE%E9%A2%98%E5%88%86%E6%9E%90.md" target="_blank" rel="noreferrer"&gt;PG Shutdown Logic and WALSender Blocking Shutdown Analysis&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;In reality, &lt;em&gt;any&lt;/em&gt; process that doesn&amp;rsquo;t exit can block shutdown. The question is which ones are most likely to cause trouble. From the shutdown code flow, archiver and walsender are frequent blockers because during shutdown they attempt a final archive or log transmission.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://camo.githubusercontent.com/45e44c384cdf1c41caf9d2018076cf420cd48c9d49be1b2078262b4303be2627/68747470733a2f2f6f73732d656d637370726f642d7075626c69632e6d6f64622e70726f2f696d6167652f656469746f722f32303235303130342d313837353338333238343037393830343431365f343538372e706e67" alt="" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If shutdown is stuck on walsender, try &lt;code&gt;kill&lt;/code&gt; (not &lt;code&gt;kill -9&lt;/code&gt;) — the checkpoint hasn&amp;rsquo;t finished yet, and a forced shutdown leaves an inconsistent state. Even for forced shutdown, prefer &lt;code&gt;pg_ctl stop -D $PGDATA -m i&lt;/code&gt; over raw &lt;code&gt;kill -9&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;If shutdown is stuck on archiver, &lt;code&gt;kill -9&lt;/code&gt; is fine — the checkpoint is already complete and the database is in a consistent state&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Partitioned Tables
 &lt;div id="partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E5%88%86%E5%8C%BA%E8%A1%A8.md" target="_blank" rel="noreferrer"&gt;Partitioned Table Basics&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s partitioned tables have unique characteristics that developers generally don&amp;rsquo;t fully understand without study, leading to many pitfalls.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index Mismatch Between Parent and Child Partitions
 &lt;div id="index-mismatch-between-parent-and-child-partitions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-mismatch-between-parent-and-child-partitions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Due to non-standard partition creation, many indexes are created directly on child tables (which should not be done), and the &amp;ldquo;create index on all children + attach&amp;rdquo; workflow is skipped. The result: the parent table has no index or no effective index. Since the parent has no data, this doesn&amp;rsquo;t directly impact queries — but when new partitions are created, they only inherit the parent&amp;rsquo;s indexes, so new child tables end up missing indexes.&lt;/p&gt;
&lt;p&gt;Fixing parent-table missing indexes is fairly straightforward: see &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E5%88%86%E5%8C%BA%E8%A1%A8.md#%E5%88%9B%E5%BB%BA%E5%88%86%E5%8C%BA%E7%B4%A2%E5%BC%95%E7%9A%84%E6%AD%A3%E7%A1%AE%E5%A7%BF%E5%8A%BF" target="_blank" rel="noreferrer"&gt;The Correct Way to Create Partition Indexes&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create an invalid index ONLY on the parent. Fast, but blocks subsequent DML — watch for long transactions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; IDX_DATECREATED &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create the index CONCURRENTLY on each child partition. Slow, but doesn&amp;#39;t block DML — watch for long DML transactions that could cause the operation to fail
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202302 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Attach all indexes. Fast, no business blocking
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202302;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Fixing a missing primary key on the parent is harder: see &lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E5%86%85%E5%8A%9F%E4%BF%AE%E7%82%BC/PostgreSQL%E5%88%86%E5%8C%BA%E8%A1%A8.md#%E5%88%86%E5%8C%BA%E8%A1%A8%E6%B7%BB%E5%8A%A0%E4%B8%BB%E9%94%AE%E5%92%8C%E5%94%AF%E4%B8%80%E7%B4%A2%E5%BC%95" target="_blank" rel="noreferrer"&gt;Adding Primary Keys and Unique Indexes to Partitioned Tables&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Adding a primary key on the parent acquires &lt;code&gt;AccessExclusiveLock&lt;/code&gt;, blocking everything. Creating an index on a partitioned table is slow, and the PK then causes further blocking. There&amp;rsquo;s currently no low-impact way to add a PK on a partitioned table. Workarounds: &amp;ldquo;attach a unique index + NOT NULL constraint&amp;rdquo;, schedule extended downtime for the partition table while the index builds, or use a third-party sync tool to populate a new table that already has the PK.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Abusing the DEFAULT Partition
 &lt;div id="abusing-the-default-partition" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#abusing-the-default-partition" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://github.com/liuzhilong62/blogs/blob/main/PostgreSQL%E6%A1%88%E4%BE%8B/%E6%B2%A1%E6%9C%89%E9%98%BB%E5%A1%9E%E4%B8%BA%E4%BB%80%E4%B9%88partition%20of%E5%88%9B%E5%BB%BA%E5%AD%90%E5%88%86%E5%8C%BA%E5%BE%88%E6%85%A2%EF%BC%9F.md" target="_blank" rel="noreferrer"&gt;Default Partition Overgrowth Causing Prolonged Blocking During &lt;code&gt;CREATE TABLE ... PARTITION OF&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The root cause is simple: when adding a new partition, the DDL must validate that data in the DEFAULT partition doesn&amp;rsquo;t conflict with the new partition&amp;rsquo;s range. This scans a large amount of data in the DEFAULT partition, and the new partition creation never completes. Blocking then cascades — business queries and writes stall.&lt;/p&gt;
&lt;p&gt;DEFAULT partition abuse is a widespread problem! The community PG doesn&amp;rsquo;t provide interval partitioning. If a developer forgets to create a partition, data silently lands in DEFAULT with no error or alert. Day after day, the DEFAULT partition grows enormous — and then the next schema change causes an outage.&lt;/p&gt;
&lt;p&gt;You can&amp;rsquo;t leave an oversized DEFAULT partition as-is forever. Even though ATTACH can avoid the blocking problem, you still need to defuse this bomb eventually.&lt;/p&gt;
&lt;p&gt;DEFAULT partition data handling — Plan 1:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;DETACH the default partition, create proper partitions, then re-insert DEFAULT data into the partitioned table&lt;/li&gt;
&lt;li&gt;If needed, after detach and creating proper partitions, create an empty DEFAULT partition to maintain business continuity&lt;/li&gt;
&lt;li&gt;Note: DETACH (unlike ATTACH) requires an AccessExclusiveLock on the parent. PG14 supports DETACH CONCURRENTLY, but not for DEFAULT partitions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;DEFAULT partition data handling — Plan 2:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;DETACH the default partition, create proper partitions, then ATTACH the detached DEFAULT table as a regular child partition — careful with range boundaries&lt;/li&gt;
&lt;li&gt;If needed, after detach and creating proper partitions, create an empty DEFAULT partition to maintain business continuity&lt;/li&gt;
&lt;li&gt;Note: DETACH (unlike ATTACH) requires an AccessExclusiveLock on the parent. PG14 supports DETACH CONCURRENTLY, but not for DEFAULT partitions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;DEFAULT partition data handling — Plan 3:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create a new table, sync all data via DTS&lt;/li&gt;
&lt;li&gt;Rename tables&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Plan 3 looks the crudest, but it&amp;rsquo;s the one I personally recommend most. If you have 5 instances to fix, a surgical approach is fine. If you have 200 instances, the labor cost makes DTS the practical winner.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Missing SELECT Privileges on Partitions Causing Abnormal Plans
 &lt;div id="missing-select-privileges-on-partitions-causing-abnormal-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#missing-select-privileges-on-partitions-causing-abnormal-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If a user lacks SELECT privilege on a child partition, their queries can&amp;rsquo;t access that partition&amp;rsquo;s statistics, leading to bad execution plans. Partitions created via &lt;code&gt;CREATE TABLE ... PARTITION OF&lt;/code&gt; normally don&amp;rsquo;t carry SELECT grants — but data is accessible through the parent — so this is a widespread issue.&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Have the cloud platform handle it automatically&lt;/li&gt;
&lt;li&gt;Enforce dev standards requiring SELECT grants on child partitions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;High-Concurrency Full Partition Scans and LWLock:lockmanager
 &lt;div id="high-concurrency-full-partition-scans-and-lwlocklockmanager" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#high-concurrency-full-partition-scans-and-lwlocklockmanager" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;This is another very common problem!&lt;/p&gt;
&lt;p&gt;I recommend reading the AWS documentation, which explains it clearly: &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/wait-event.lw-lock-manager.html" target="_blank" rel="noreferrer"&gt;https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/wait-event.lw-lock-manager.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Symptoms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Spiking active sessions&lt;/li&gt;
&lt;li&gt;Severe LWLock:lockmanager wait events&lt;/li&gt;
&lt;li&gt;Database performance cliff&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Trigger conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Query scans multiple partitions&lt;/li&gt;
&lt;li&gt;That query has high concurrency&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The fastpath lock mechanism is designed for quick access to &amp;ldquo;weak locks&amp;rdquo;, improving database concurrency&lt;/li&gt;
&lt;li&gt;fastpath works for lock levels ≤ 3 — i.e., SELECT, SELECT FOR xxx, and DML (lock modes below &lt;code&gt;ShareUpdateExclusiveLock&lt;/code&gt; — levels 1, 2, 3 can use fastpath). It&amp;rsquo;s meant to benefit normal operations&lt;/li&gt;
&lt;li&gt;&lt;code&gt;FP_LOCK_SLOTS_PER_BACKEND&lt;/code&gt;: a local process holds at most 16 fastpath locks; beyond that, it must acquire locks in shared memory, producing &lt;code&gt;LWLock:lockmanager&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Not just tables — every accessed index also acquires a lock&lt;/li&gt;
&lt;li&gt;This problem isn&amp;rsquo;t tightly coupled to partition count — even a modest number of partitions can trigger &lt;code&gt;LWLock:lockmanager&lt;/code&gt; and degrade performance&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s calculate: with a partitioned table having 1 primary key and 2 regular indexes, how many partitions exhaust the fastpath?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; indexes &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;) &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; parent &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; child partitions&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Yes — a full scan across just 3 partitions can already trigger LWLock:lockmanager waits.&lt;/p&gt;
&lt;p&gt;For a regular table, 16 indexes would similarly exhaust fastpath.&lt;/p&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For not-too-large tables, merge partitions into a regular table&lt;/li&gt;
&lt;li&gt;Add partition key filter conditions to queries&lt;/li&gt;
&lt;li&gt;Reduce indexes (not very practical, since partition count alone can exceed 16)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The hard part:&lt;/p&gt;
&lt;p&gt;In Oracle-to-PG migrations, Oracle supports global indexes, so primary keys and unique indexes don&amp;rsquo;t need to include the partition key. In PG, they must include the partition key.&lt;/p&gt;
&lt;p&gt;PK example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idxlzl(primarykey) &lt;span style="color:#75715e"&gt;--oracle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idxlzl(primarykey,partitionkey) &lt;span style="color:#75715e"&gt;--pg&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A common query pattern:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; primarykey&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12345&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Should you push the application to add a partition filter here? It&amp;rsquo;s a tough sell. The resistance is: &amp;ldquo;I already passed the primary key — what more do you want? If I knew everything, why would I query the database?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;In this case, the only recommendation is to convert the partitioned table to a regular table. I haven&amp;rsquo;t found a better solution.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory
 &lt;div id="memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Excessive Objects Leading to Oversized relcache
 &lt;div id="excessive-objects-leading-to-oversized-relcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#excessive-objects-leading-to-oversized-relcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;relcache stores relation metadata: OID, pg_class info, partitions, subtransactions, row-level security policies, statistics, index metadata, access methods, etc.&lt;/li&gt;
&lt;li&gt;Each session has its own (rel)cache for system catalog data (metadata, etc.)&lt;/li&gt;
&lt;li&gt;Normally this cache is small. When the catalog is huge and a session accesses all of it, the cache can become very large&lt;/li&gt;
&lt;li&gt;Cache management is simple: no eviction mechanism, no limit (though there are invalidation messages)&lt;/li&gt;
&lt;li&gt;Closing the session releases the cache&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solutions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reduce the number of objects — especially check whether partition child tables are excessive&lt;/li&gt;
&lt;li&gt;Set aggressive connection-pool disconnection parameters so business connections recycle more frequently&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Memory Fragmentation
 &lt;div id="memory-fragmentation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-fragmentation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Recommended commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo|grep whatyouneed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/buddyinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## cgroup memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/opt/cgtools/cginfo -t perf -s mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Pay attention to pgscand/s (direct memory reclaim) — values in the tens of thousands indicate a problem&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar -B -s &lt;span style="color:#e6db74"&gt;&amp;#34;08:00:00&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;09:00:00&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# min_free_kbytes setting:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/sys/vm/min_free_kbytes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Total physical memory usage of all processes:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep Pss /proc/&lt;span style="color:#f92672"&gt;[&lt;/span&gt;1-9&lt;span style="color:#f92672"&gt;]&lt;/span&gt;*/smaps | awk &lt;span style="color:#e6db74"&gt;&amp;#39;{total+=$2}; END {printf &amp;#34;%d kB\n&amp;#34;, total }&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# PSS memory for a specific process:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# RSS memory for a specific process:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Private memory for a specific process:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;min_free_kbytes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://camo.githubusercontent.com/ec10b5b4434febdb6675545e2beaa60646be264db9fb8259cd787cdd4771054b/68747470733a2f2f692d626c6f672e6373646e696d672e636e2f626c6f675f6d6967726174652f35653435303466323634303231633438386438613637623962333665666265322e706e67" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/" target="_blank" rel="noreferrer"&gt;https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;When free memory is low, the kswapd daemon is woken to free pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pages_low: when free pages fall below pages_low, buddy allocator wakes kswapd and the kernel begins swapping pages to disk&lt;/li&gt;
&lt;li&gt;pages_min: when free pages reach pages_min, reclamation pressure is high — the zone urgently needs free pages. The allocator performs synchronous kswapd work, sometimes called direct reclaim&lt;/li&gt;
&lt;li&gt;pages_high: once kswapd is awake and freeing pages, the kernel considers the zone &amp;ldquo;balanced&amp;rdquo; only when free pages reach pages_high. At pages_high, kswapd goes back to sleep. Free pages above pages_high means the zone is in an ideal state&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;vm.min_free_kbytes&lt;/code&gt; (the pages_min watermark) is an extremely important OS parameter. Too low a value prevents effective memory reclamation, potentially causing system crashes and service interruptions. Too high a value increases reclaim activity, causing allocation delays that can immediately trigger OOM.&lt;/p&gt;
&lt;p&gt;Optimization results:&lt;/p&gt;
&lt;p&gt;After increasing &lt;code&gt;min_free_kbytes&lt;/code&gt; + deploying off-peak drop-cache jobs, problems have decreased significantly.&lt;/p&gt;
&lt;p&gt;Why increase min_free_kbytes?&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This is used to force the Linux VM to keep a minimum number of kilobytes free. The VM uses this number to compute a watermark[WMARK_MIN] value for each lowmem zone in the system. Each lowmem zone gets a number of reserved free pages based &lt;strong&gt;proportionally&lt;/strong&gt; on its size.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;a href="https://www.kernel.org/doc/html/latest/admin-guide/sysctl/vm.html#min-free-kbytes" target="_blank" rel="noreferrer"&gt;Source: kernel.org docs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The point of raising min_free_kbytes isn&amp;rsquo;t to raise the min watermark and trigger direct reclaim more often — it&amp;rsquo;s because the low watermark couldn&amp;rsquo;t be tuned before Linux 7. The only way to raise low proportionally was to raise min, making asynchronous reclaim trigger earlier and giving direct reclaim a buffer window.&lt;/p&gt;
&lt;p&gt;Red Hat 8 added two memory parameters to improve reclaim: &lt;code&gt;watermark_scale_factor&lt;/code&gt; can raise watermarks without touching &lt;code&gt;min_free_kbytes&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Recommend enabling huge pages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Huge pages perform better when PG requests contiguous memory&lt;/li&gt;
&lt;li&gt;Huge pages also help reduce page cache size&lt;/li&gt;
&lt;li&gt;shared_buffers can use huge pages; requires &lt;code&gt;Huge_pages=on&lt;/code&gt; and OS-level huge pages enabled&lt;/li&gt;
&lt;li&gt;Instances with huge pages enabled in production show better performance and fewer problems&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html#AuroraPostgreSQL.Managing.HugePages" target="_blank" rel="noreferrer"&gt;AWS huge pages standard&lt;/a&gt;: enabled by default for all instance classes except certain test tiers, and cannot be disabled&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;code&gt;Huge_pages&lt;/code&gt; parameter is turned on by default for all DB instance classes other than t3.medium, db.t3.large, db.t4g.medium, db.t4g.large instance classes. You can&amp;rsquo;t change the &lt;code&gt;huge_pages&lt;/code&gt; parameter value or turn off this feature in the supported instance classes of Aurora PostgreSQL.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;cgroup and Host Memory Mismatch
 &lt;div id="cgroup-and-host-memory-mismatch" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-and-host-memory-mismatch" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When cgroup memory hits its limit, kswapd prioritizes reclaiming pages within the cgroup. With cloud VM instance types and cgroup configurations, the host may have free memory above watermarks while the cgroup is under pressure. The host-level pages_low doesn&amp;rsquo;t trigger asynchronous reclaim for either host or cgroup memory. Eventually, direct reclaim fires to satisfy the cgroup&amp;rsquo;s DB memory demand.&lt;/p&gt;
&lt;p&gt;The root cause: cgroups lack independent free-page memory management.&lt;/p&gt;
&lt;p&gt;The only fix: increase the cgroup memory limit, overcommitting the host more aggressively so the host reaches pages_low sooner.&lt;/p&gt;

&lt;h3 class="relative group"&gt;shared_buffer and pagecache
 &lt;div id="shared_buffer-and-pagecache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared_buffer-and-pagecache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG uses a double-buffer mechanism — no direct IO yet.&lt;/p&gt;
&lt;p&gt;Double buffer: DB shared_buffers (one layer of shared memory) + OS pagecache (another layer). In real deployments, pagecache is typically far larger than shared_buffers. And pagecache counts against cgroup mem but isn&amp;rsquo;t reflected in cgroup memory monitoring&amp;hellip;&lt;/p&gt;
&lt;p&gt;Bottom line: leave plenty of memory for pagecache. Don&amp;rsquo;t make shared_buffers excessively large (20GB seems sufficient for most cases). Only increase it if you clearly observe buffer-mapping-related wait events.&lt;/p&gt;

&lt;h3 class="relative group"&gt;work_mem Cannot Cap Hash Join / Hash Aggregate Memory
 &lt;div id="work_mem-cannot-cap-hash-join--hash-aggregate-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#work_mem-cannot-cap-hash-join--hash-aggregate-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;hash_mem_multiplier&lt;/strong&gt; limits memory for hash-based operations (hash join, hash agg, etc.), capping at &lt;code&gt;hash_mem_multiplier * work_mem&lt;/code&gt;. The default is 2.&lt;/p&gt;
&lt;p&gt;Before PG13, &lt;code&gt;work_mem&lt;/code&gt; was tunable, but there was no way to limit how many hash operations a single query could use. PG13 added this multiplier. In other words, pre-13, it was very hard to cap hash-table memory.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;In a PG12- production environment, I found a single session consuming 300GB of memory — the culprit was the lack of hash-table limits combined with a plan that incorrectly used hash tables.&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Other Issues
 &lt;div id="other-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Exclusive Backup and Startup Issues
 &lt;div id="exclusive-backup-and-startup-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#exclusive-backup-and-startup-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, when the database stops and restarts, the startup position comes from &lt;code&gt;pg_controldata&lt;/code&gt;&amp;rsquo;s LSN. But if there&amp;rsquo;s a &lt;code&gt;backup_label&lt;/code&gt; file in PGDATA, the startup LSN is read from &lt;code&gt;backup_label&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;What problems does this cause?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Disk snapshots taken directly on the data directory may include the label file. If the database is large and the backup took a long time, restart can be very slow&lt;/li&gt;
&lt;li&gt;Bigger problem: after a production shutdown from certain causes, restart takes forever. The root cause is the startup LSN coming from the backup rather than controldata&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Version changes:&lt;/p&gt;
&lt;p&gt;PG13:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_start_backup()&lt;/code&gt;
&lt;code&gt;pg_stop_backup()&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Supports exclusive and non-exclusive modes; exclusive is the default. Exclusive mode creates &lt;code&gt;backup_label&lt;/code&gt; in the data directory at start and cleans it at stop. Non-exclusive mode doesn&amp;rsquo;t create the label at start; it returns the label info at stop.&lt;/p&gt;
&lt;p&gt;PG15:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_backup_start()&lt;/code&gt;
&lt;code&gt;pg_backup_stop()&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Function names changed, and &lt;strong&gt;exclusive backup mode was removed&lt;/strong&gt;. No &lt;code&gt;backup_label&lt;/code&gt; is written at backup start; instead it&amp;rsquo;s written to the backup area at backup stop.&lt;/p&gt;

&lt;h3 class="relative group"&gt;pg_stat_activity Unqueryable
 &lt;div id="pg_stat_activity-unqueryable" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_stat_activity-unqueryable" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Symptom:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_stat_activity&lt;/code&gt; hangs and can&amp;rsquo;t be queried.&lt;/p&gt;
&lt;p&gt;pstack at the time:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; pgstat_read_current_status () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstat.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3642&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000727181 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pgstat_read_current_status () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstat.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2788&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; pgstat_fetch_stat_numbackends () &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstat.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2789&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000083f2ee &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; pg_stat_get_activity (fcinfo&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25c2d98) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; pgstatfuncs.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;575&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000065058f &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ExecMakeTableFunctionResult (setexpr&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1d28, econtext&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1c48, argContext&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, expectedDesc&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2545218, randomAccess&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; execSRF.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000006609dc &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; FunctionNext (node&lt;span style="color:#f92672"&gt;=&lt;/span&gt;node&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1b38) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; nodeFunctionscan.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000065110c &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ExecScanFetch (recheckMtd&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x660700 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;FunctionRecheck&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, accessMtd&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x660720 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;FunctionNext&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, node&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x25b1b38) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; execScan.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;133&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Analysis:&lt;/p&gt;
&lt;p&gt;The code location is clear — stuck in an infinite loop after &lt;code&gt;st_changecount&lt;/code&gt; becomes odd.&lt;/p&gt;
&lt;p&gt;Triggers: OOM (reproducible), abnormal backend exit (possible), terminate (maybe). None of these guarantee the issue, though.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/5979.1557543440%40sss.pgh.pa.us" target="_blank" rel="noreferrer"&gt;Community thread&lt;/a&gt; didn&amp;rsquo;t reach a conclusion. Currently the trigger probability appears low.&lt;/p&gt;
&lt;p&gt;Solution: restart the database.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Connection and Connection Pooling Issues
 &lt;div id="connection-and-connection-pooling-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#connection-and-connection-pooling-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;IO Error Messages
 &lt;div id="io-error-messages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#io-error-messages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;IO errors typically mean the application is using a connection that&amp;rsquo;s already been closed. This happens often, and diagnosing it is difficult because the entire chain involves many components and broad domain knowledge. Here&amp;rsquo;s a brief summary.&lt;/p&gt;
&lt;p&gt;Known active-disconnection scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;hikari &lt;code&gt;maxLifetime&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Symptom: session lifetime matches the parameter. Possible cause: the application holds an explicit transaction with an uncommitted SELECT, the pool closes the session, and the app gets &lt;code&gt;io error; could not rollback&lt;/code&gt; or similar.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg.datasouce.maxLifetime&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;druid timeout&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Symptom: connection drops after SQL execution exceeds 20s.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.dynamic.druid.socketTimeout=20000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.dynamic.druid.connectTimeout=20000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Change to:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.socketTimeout=3600000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spring.datasource.connectTimeout=3600000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Application Horizontal Scaling vs. Database Connection Limits
 &lt;div id="application-horizontal-scaling-vs-database-connection-limits" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#application-horizontal-scaling-vs-database-connection-limits" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Horizontal application scaling meets PG connection bottlenecks:&lt;/p&gt;
&lt;p&gt;HikariCP is now Spring Boot&amp;rsquo;s default connection pool. With the proliferation of Spring Boot and microservices, HikariCP usage is widespread. Every pod scaled out increases database connection count. The &lt;code&gt;maximumPoolSize&lt;/code&gt; stays the same per pod, but more nodes mean more total connections. From existing node count, added node count, and current total connections, you can proportionally calculate how many idle connections will be added.&lt;/p&gt;
&lt;p&gt;Applications can scale horizontally without state, but databases cannot. PG&amp;rsquo;s connection limit is &lt;code&gt;max_connections&lt;/code&gt;. Unchecked application scaling can saturate idle connections. Tuning &lt;code&gt;max_connections&lt;/code&gt; is painful because it requires a database restart.&lt;/p&gt;
&lt;p&gt;PG connection upper limit:&lt;/p&gt;
&lt;p&gt;Also, even with unlimited horizontal scaling, &lt;code&gt;max_connections&lt;/code&gt; should adjust with instance class — but there&amp;rsquo;s a real ceiling. In any database, idle connections degrade performance as they increase.&lt;/p&gt;
&lt;p&gt;Refer to &lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Managing.html#AuroraPostgreSQL.Managing.MaxConnections" target="_blank" rel="noreferrer"&gt;AWS&amp;rsquo;s approach&lt;/a&gt;:
&lt;code&gt;max_connections&lt;/code&gt; is tied to instance class, with a maximum of &lt;code&gt;5000, LEAST({DBInstanceClassMemory/9531392}, 5000)&lt;/code&gt;. This reduces manual connection ops and provides a reasonable ceiling.&lt;/p&gt;</content:encoded></item><item><title>PG Shutdown Logic and Walsender Blocking Shutdown Analysis</title><link>https://lastdba.com/en/2025/01/04/pg-shutdown-logic-and-walsender-blocking-shutdown-analysis/</link><pubDate>Sat, 04 Jan 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/01/04/pg-shutdown-logic-and-walsender-blocking-shutdown-analysis/</guid><description>&lt;h2 class="relative group"&gt;Walsender Blocking Shutdown Symptoms
 &lt;div id="walsender-blocking-shutdown-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-blocking-shutdown-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Production shutdown log output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.036 CST,,,447560,,65693cde.6d448,1320,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received fast shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.295 CST,,,447560,,65693cde.6d448,1322,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;background worker &amp;#34;&amp;#34;logical replication launcher&amp;#34;&amp;#34; (PID 448996) exited with exit code 1&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.627 CST,,,448990,,65693ce0.6d9de,213833,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint complete: wrote 426844 buffers (5.1%); 0 WAL file(s) added, 0 removed, 5 recycled; write=91.427 s, sync=0.055 s, total=91.508 s; sync files=761, longest=0.028 s, average=0.001 s; distance=2197531 kB, estimate=2680783 kB&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.628 CST,,,448990,,65693ce0.6d9de,213834,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;shutting down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--checkpointer finished checkpoint and is in shutting down state, pm has not exited
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--160s later pm receives immediate shutdown, triggered by health check script
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.348 CST,,,447560,,65693cde.6d448,1323,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received immediate shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,283840,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39865&amp;#34;&lt;/span&gt;,6751a2dc.454c0,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-12-05 20:55:56 CST,89/847309655,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157641,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39407&amp;#34;&lt;/span&gt;,67408354.267c9,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:52 CST,9/3193590104,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157916,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:57038&amp;#34;&lt;/span&gt;,67408356.268dc,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,115/3293293502,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,164392,&lt;span style="color:#e6db74"&gt;&amp;#34;30.151.40.19:41641&amp;#34;&lt;/span&gt;,66b25869.28228,3,&lt;span style="color:#e6db74"&gt;&amp;#34;streaming 42D3B/1732C5F0&amp;#34;&lt;/span&gt;,2024-08-07 01:07:53 CST,296/0,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;standby_6666&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,,,447560,,65693cde.6d448,1324,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;archiver process (PID 448994) exited with exit code 2&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,57755,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:38918&amp;#34;&lt;/span&gt;,67125534.e19b,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-10-18 20:31:48 CST,243/902018192,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.372 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157915,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:43433&amp;#34;&lt;/span&gt;,67408356.268db,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,60/3248014863,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--pm finished shutting down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:57.534 CST,,,447560,,65693cde.6d448,1325,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system is shut down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,1,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;ending log output to stderr&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;Future log output will go to log destination &amp;#34;&amp;#34;csvlog&amp;#34;&amp;#34;.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;17:00:02 postmaster receives fast shutdown&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Walsender Blocking Shutdown Symptoms
 &lt;div id="walsender-blocking-shutdown-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-blocking-shutdown-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Production shutdown log output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.036 CST,,,447560,,65693cde.6d448,1320,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received fast shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:02.295 CST,,,447560,,65693cde.6d448,1322,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;background worker &amp;#34;&amp;#34;logical replication launcher&amp;#34;&amp;#34; (PID 448996) exited with exit code 1&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.627 CST,,,448990,,65693ce0.6d9de,213833,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint complete: wrote 426844 buffers (5.1%); 0 WAL file(s) added, 0 removed, 5 recycled; write=91.427 s, sync=0.055 s, total=91.508 s; sync files=761, longest=0.028 s, average=0.001 s; distance=2197531 kB, estimate=2680783 kB&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:00:10.628 CST,,,448990,,65693ce0.6d9de,213834,,2023-12-01 09:54:40 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;shutting down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--checkpointer finished checkpoint and is in shutting down state, pm has not exited
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--160s later pm receives immediate shutdown, triggered by health check script
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.348 CST,,,447560,,65693cde.6d448,1323,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;received immediate shutdown request&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,283840,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39865&amp;#34;&lt;/span&gt;,6751a2dc.454c0,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-12-05 20:55:56 CST,89/847309655,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157641,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:39407&amp;#34;&lt;/span&gt;,67408354.267c9,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:52 CST,9/3193590104,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157916,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:57038&amp;#34;&lt;/span&gt;,67408356.268dc,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,115/3293293502,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.370 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,164392,&lt;span style="color:#e6db74"&gt;&amp;#34;30.151.40.19:41641&amp;#34;&lt;/span&gt;,66b25869.28228,3,&lt;span style="color:#e6db74"&gt;&amp;#34;streaming 42D3B/1732C5F0&amp;#34;&lt;/span&gt;,2024-08-07 01:07:53 CST,296/0,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;standby_6666&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,,,447560,,65693cde.6d448,1324,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;archiver process (PID 448994) exited with exit code 2&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.371 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,57755,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:38918&amp;#34;&lt;/span&gt;,67125534.e19b,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-10-18 20:31:48 CST,243/902018192,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:43.372 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;logicaluser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,157915,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.77.159:43433&amp;#34;&lt;/span&gt;,67408356.268db,7,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-11-22 21:12:54 CST,60/3248014863,0,WARNING,57P02,&lt;span style="color:#e6db74"&gt;&amp;#34;terminating connection because of crash of another server process&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;In a moment you should be able to reconnect to the database and repeat your command.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;Debezium Streaming&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;walsender&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--pm finished shutting down
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:57.534 CST,,,447560,,65693cde.6d448,1325,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system is shut down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,1,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;ending log output to stderr&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;Future log output will go to log destination &amp;#34;&amp;#34;csvlog&amp;#34;&amp;#34;.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;17:00:02 postmaster receives fast shutdown&lt;/p&gt;
&lt;p&gt;17:00:10 checkpoint completed, checkpointer stopped&lt;/p&gt;
&lt;p&gt;17:02:43 postmaster receives immediate shutdown&lt;/p&gt;
&lt;p&gt;17:02:43 1 physical and 5 logical replication walsenders stopped&lt;/p&gt;
&lt;p&gt;17:02:57 postmaster stopped&lt;/p&gt;
&lt;p&gt;17:03:49 postmaster receives startup task&lt;/p&gt;
&lt;p&gt;From the above, it&amp;rsquo;s clear that walsender was blocking the shutdown.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shutdown and Signals
 &lt;div id="shutdown-and-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-and-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Before diving into source code, we need to understand signals and signal registration in PG.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Common Signals in PG
 &lt;div id="common-signals-in-pg" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#common-signals-in-pg" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;OS signals:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ kill -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGHUP 2&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGINT 3&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGQUIT 4&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGILL 5&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGTRAP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 6&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGABRT 7&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGBUS 8&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGFPE 9&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGKILL 10&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGUSR1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;11&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGSEGV 12&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGUSR2 13&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGPIPE 14&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGALRM 15&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGTERM
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;16&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGSTKFLT 17&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGCHLD 18&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGCONT 19&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGSTOP 20&lt;span style="color:#f92672"&gt;)&lt;/span&gt; SIGTSTP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Common signals used in PG:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-1&lt;/code&gt; or &lt;code&gt;-SIGHUP&lt;/code&gt;: Hangup signal. In PG, typically tells the process to reload configuration.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-2&lt;/code&gt; or &lt;code&gt;-SIGINT&lt;/code&gt;: Interrupt signal (usually &lt;code&gt;Ctrl+C&lt;/code&gt;). In PG, usually corresponds to cancel command.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-3&lt;/code&gt; or &lt;code&gt;-SIGQUIT&lt;/code&gt;: In PG, usually means forced exit (die).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-9&lt;/code&gt; or &lt;code&gt;-SIGKILL&lt;/code&gt;: Unconditional termination signal.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-15&lt;/code&gt; or &lt;code&gt;-SIGTERM&lt;/code&gt;: Termination signal, the signal used by &lt;code&gt;pg_terminate_backend&lt;/code&gt;. In PG, usually means graceful exit.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-10&lt;/code&gt; or &lt;code&gt;-SIGUSR1&lt;/code&gt;: Custom signal.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-12&lt;/code&gt; or &lt;code&gt;-SIGUSR2&lt;/code&gt;: Custom signal.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;-17&lt;/code&gt; or &lt;code&gt;SIGCHLD&lt;/code&gt;: Signal used by the pm process. When a child process exits, pm receives this signal to trigger child process reaping.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The specific meaning of signals registered by each type of PG process can be found by reading the respective process source code.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shutdown Defined by pg_ctl
 &lt;div id="shutdown-defined-by-pg_ctl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-defined-by-pg_ctl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are several ways to shut down a PG database. At the bottom level, they all boil down to sending a signal to the postmaster process.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;signal&lt;/th&gt;
 &lt;th&gt;pg_ctl&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;SIGTERM&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;Smart Shutdown&lt;/em&gt;&lt;/td&gt;
 &lt;td&gt;Disallow new connections, but allow existing sessions to finish their work normally. Only shuts down after all sessions terminate.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;SIGINT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;Fast Shutdown&lt;/em&gt;&lt;/td&gt;
 &lt;td&gt;Server disallows new connections and sends &lt;strong&gt;SIGTERM&lt;/strong&gt; to all existing child processes, aborting current transactions and exiting quickly. Waits for almost all child processes (some are not needed) to exit, then shuts down.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;SIGQUIT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;Immediate Shutdown&lt;/em&gt;&lt;/td&gt;
 &lt;td&gt;Sends &lt;strong&gt;SIGQUIT&lt;/strong&gt; to all child processes and waits for them to terminate. If any child process has not terminated within 5 seconds, they are sent &lt;strong&gt;SIGKILL&lt;/strong&gt;.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Note: &lt;code&gt;pg_ctl&lt;/code&gt; has no parameter for sending &lt;code&gt;SIGKILL&lt;/code&gt; (&lt;code&gt;kill -9&lt;/code&gt;), but you can send &lt;code&gt;SIGKILL&lt;/code&gt; directly to pm — though it&amp;rsquo;s definitely not recommended. When sending &lt;code&gt;SIGKILL&lt;/code&gt; to pm, pm won&amp;rsquo;t do any cleanup of child processes, shared memory, or semaphores. Since &lt;code&gt;SIGQUIT&lt;/code&gt; to pm has fallback logic for &lt;code&gt;SIGKILL&lt;/code&gt;-ing child processes, &lt;code&gt;SIGQUIT&lt;/code&gt; to pm basically guarantees pm will stop.&lt;/p&gt;
&lt;p&gt;In the source code, there are only 3 &lt;strong&gt;shutdown states&lt;/strong&gt;, corresponding to shutdown modes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Startup/shutdown state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			NoShutdown		0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			SmartShutdown	1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			FastShutdown	2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define			ImmediateShutdown	3&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;These states appear frequently in shutdown routine source code, generally checked via the &lt;code&gt;Shutdown&lt;/code&gt; variable:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; FastShutdown&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;pm Signals
 &lt;div id="pm-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pm-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When pm receives the corresponding signal, it handles it accordingly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PostmasterMain&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; argc, &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;argv[])
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGHUP, SIGHUP_handler);	&lt;span style="color:#75715e"&gt;/* reread config file and have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;											 * children do same */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGINT, pmdie); &lt;span style="color:#75715e"&gt;/* send SIGTERM and shut down */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGQUIT, pmdie);	&lt;span style="color:#75715e"&gt;/* send SIGQUIT and die */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGTERM, pmdie);	&lt;span style="color:#75715e"&gt;/* wait for children and shut down */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGALRM, SIG_IGN);	&lt;span style="color:#75715e"&gt;/* ignored */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGPIPE, SIG_IGN);	&lt;span style="color:#75715e"&gt;/* ignored */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGUSR1, sigusr1_handler);	&lt;span style="color:#75715e"&gt;/* message from child process */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGUSR2, dummy_handler);	&lt;span style="color:#75715e"&gt;/* unused, reserve for children */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal_pm&lt;/span&gt;(SIGCHLD, reaper);	&lt;span style="color:#75715e"&gt;/* handle child termination */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;pmdie&lt;/code&gt;: The three shutdown signals call the &lt;code&gt;pmdie&lt;/code&gt; function. &lt;code&gt;pmdie&lt;/code&gt; is the key shutdown function, analyzed in detail below.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;reaper&lt;/code&gt;: During shutdown, handles child process exit cleanup. When a child process exits, it sends &lt;code&gt;SIGCHLD&lt;/code&gt; to pm, which enters &lt;code&gt;reaper&lt;/code&gt; to clean up the child. Each child process cleanup has its own logic — for instance, normal exit of the checkpointer process checks whether archiver and walsender have completed their respective tasks.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sigusr1&lt;/code&gt;, &lt;code&gt;sigusr2&lt;/code&gt;: &lt;code&gt;sigusr1_handler&lt;/code&gt; is the standard routine for &lt;code&gt;SIGUSR1&lt;/code&gt;. Each child process handles &lt;code&gt;SIGUSR1&lt;/code&gt; differently. &lt;code&gt;SIGUSR2&lt;/code&gt; is entirely custom per child process; some child processes don&amp;rsquo;t even register this signal.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Walsender Signals
 &lt;div id="walsender-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When a child process is forked, it first registers signals.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;WalSndSignals&lt;/code&gt; registers signals for the walsender process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Set up signal handlers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndSignals&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Set up signal handlers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGHUP, SignalHandlerForConfigReload);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGINT, StatementCancelHandler);	&lt;span style="color:#75715e"&gt;/* query cancel */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGTERM, die);		&lt;span style="color:#75715e"&gt;/* request shutdown */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGQUIT, quickdie);	&lt;span style="color:#75715e"&gt;/* hard crash time */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;InitializeTimeouts&lt;/span&gt;();		&lt;span style="color:#75715e"&gt;/* establishes SIGALRM handler */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGPIPE, SIG_IGN);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR1, procsignal_sigusr1_handler);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR2, WalSndLastCycleHandler);	&lt;span style="color:#75715e"&gt;/* request a last cycle and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;												 * shutdown */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &lt;code&gt;SIGUSR1&lt;/code&gt; and &lt;code&gt;SIGUSR2&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Checkpointer Signals
 &lt;div id="checkpointer-signals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkpointer-signals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;CheckpointerMain&lt;/code&gt; registers checkpointer signals:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckpointerMain&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//checkpointer blocks SIGTERM, the actual stop signal is SIGUSR2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGHUP, SignalHandlerForConfigReload);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGINT, ReqCheckpointHandler); &lt;span style="color:#75715e"&gt;/* request checkpoint */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGTERM, SIG_IGN); &lt;span style="color:#75715e"&gt;/* ignore SIGTERM */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGQUIT, SignalHandlerForCrashExit);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGALRM, SIG_IGN);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGPIPE, SIG_IGN);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR1, procsignal_sigusr1_handler);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR2, SignalHandlerForShutdownRequest);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &lt;code&gt;SIGUSR1&lt;/code&gt; and &lt;code&gt;SIGUSR2&lt;/code&gt;, and also note that checkpointer does not register &lt;code&gt;SIGTERM&lt;/code&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shutdown Source Code Analysis
 &lt;div id="shutdown-source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;pm Signal Handling and State Machine
 &lt;div id="pm-signal-handling-and-state-machine" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pm-signal-handling-and-state-machine" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;pmdie&lt;/code&gt; function handles different postmaster signals, including &lt;code&gt;SIGCHLD&lt;/code&gt; sent by child processes to pm and shutdown signals sent by &lt;code&gt;pg_ctl&lt;/code&gt;. The main logic of pm signal handling is converting the signal into a &lt;code&gt;pmState&lt;/code&gt; state machine state transition, then entering &lt;code&gt;PostmasterStateMachine&lt;/code&gt; for processing.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pmdie&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * pmdie -- signal handler for processing various postmaster signals.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pmdie&lt;/span&gt;(SIGNAL_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (postgres_signal_arg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SIGTERM:&lt;span style="color:#75715e"&gt;//Smart Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_RUN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				connsAllowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ALLOW_SUPERUSER_CONNS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart shutdown does not process pmstate, hands directly to state machine
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;//at this point normal pmState = PM_RUN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SIGINT:&lt;span style="color:#75715e"&gt;//Fast Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_RUN &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_HOT_STANDBY)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Report that we&amp;#39;re about to zap live client sessions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;aborting any active transactions&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_STOP_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;//Fast Shutdown transitions pmstate to PM_STOP_BACKENDS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//then hands to state machine
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SIGQUIT:&lt;span style="color:#75715e"&gt;//Immediate Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;TerminateChildren&lt;/span&gt;(SIGQUIT);&lt;span style="color:#75715e"&gt;//abort all children with SIGQUIT, wait for them to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* set stopwatch for them to die */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			AbortStartTime &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;time&lt;/span&gt;(NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Immediate Shutdown transitions pmstate to PM_WAIT_BACKENDS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//process children before entering state machine
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//first interrupt children with SIGQUIT, wait for them to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//then use SIGKILL on remaining children
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//finally non-consistent exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before entering the state machine handler, let&amp;rsquo;s look at the postmaster states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_INIT,					&lt;span style="color:#75715e"&gt;/* postmaster starting */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_STARTUP,					&lt;span style="color:#75715e"&gt;/* waiting for startup subprocess */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_RECOVERY,				&lt;span style="color:#75715e"&gt;/* in archive recovery mode */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_HOT_STANDBY,				&lt;span style="color:#75715e"&gt;/* in hot standby mode */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_RUN,						&lt;span style="color:#75715e"&gt;/* normal &amp;#34;database is alive&amp;#34; state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_STOP_BACKENDS,			&lt;span style="color:#75715e"&gt;/* need to stop remaining backends */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_WAIT_BACKENDS,			&lt;span style="color:#75715e"&gt;/* waiting for live backends to exit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_SHUTDOWN,				&lt;span style="color:#75715e"&gt;/* waiting for checkpointer to do shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * ckpt */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_SHUTDOWN_2,				&lt;span style="color:#75715e"&gt;/* waiting for archiver and walsenders to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * finish */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_WAIT_DEAD_END,			&lt;span style="color:#75715e"&gt;/* waiting for dead_end children to exit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PM_NO_CHILDREN				&lt;span style="color:#75715e"&gt;/* all important children have exited */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} PMState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since shutdown normally happens from the running state, we only need to focus on states at &lt;code&gt;PM_RUN&lt;/code&gt; and below.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PostmasterStateMachine&lt;/code&gt; execution has a sequential logic:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Advance the postmaster&amp;#39;s state machine and take actions as appropriate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This is common code for pmdie(), reaper() and sigusr1_handler(), which
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * receive the signals that might mean we need to change state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//smart shutdown, pmState should be PM_RUN at this point
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_RUN &lt;span style="color:#f92672"&gt;||&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_HOT_STANDBY)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (connsAllowed &lt;span style="color:#f92672"&gt;==&lt;/span&gt; ALLOW_NO_CONNS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//After all normal backends exit, transition pmState to PM_STOP_BACKENDS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CountChildren&lt;/span&gt;(BACKEND_TYPE_NORMAL) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_STOP_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_STOP_BACKENDS stops some core child processes, some will continue running
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//autovacuum, bgwriter, walwriter, startup, walreceiver will stop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//walsender, checkpointer, archiver, stats, and syslogger will keep running
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart shutdown later phase enters this logic, fast shutdown enters directly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_STOP_BACKENDS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//Note this line about walsender!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Signal all backend children except walsenders */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SignalSomeChildren&lt;/span&gt;(SIGTERM,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 BACKEND_TYPE_ALL &lt;span style="color:#f92672"&gt;-&lt;/span&gt; BACKEND_TYPE_WALSND);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* and the autovac launcher too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (AutoVacPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(AutoVacPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* and the bgwriter too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (BgWriterPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(BgWriterPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* and the walwriter too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalWriterPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(WalWriterPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* If we&amp;#39;re in recovery, also stop startup and walreceiver procs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (StartupPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(StartupPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalReceiverPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(WalReceiverPID, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* checkpointer, archiver, stats, and syslogger may continue for now */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Transition pmState from PM_STOP_BACKENDS to PM_WAIT_BACKEND
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_WAIT_BACKEND means waiting for backends to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_BACKENDS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we are in a state-machine state that implies waiting for backends to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * exit, see if they&amp;#39;re all gone, and change state if so.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart shutdown, fast shutdown later phase enters this logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//immediate shutdown when entering state machine, directly enters this logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_WAIT_BACKENDS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//During crash recovery and immediate shutdown, checkpointer needs proper exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//archiver, stats, and syslogger don&amp;#39;t need handling since they don&amp;#39;t touch shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Walsenders also don&amp;#39;t need handling; they exit after checkpoint record is written, just like archiver
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CountChildren&lt;/span&gt;(BACKEND_TYPE_ALL &lt;span style="color:#f92672"&gt;-&lt;/span&gt; BACKEND_TYPE_WALSND) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			StartupPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalReceiverPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;FatalError &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; Shutdown &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; ImmediateShutdown)) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; ImmediateShutdown &lt;span style="color:#f92672"&gt;||&lt;/span&gt; FatalError)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;//ImmediateShutdown waits for dead end processes to finish
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_DEAD_END;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * We already SIGQUIT&amp;#39;d the archiver and stats processes, if
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * any, when we started immediate shutdown or entered
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * FatalError state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//smart, fast shutdown goes here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//regular child processes have all exited, now notify checkpointer to do shutdown checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; NoShutdown);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;//If checkpointer process doesn&amp;#39;t exist, start one
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartCheckpointer&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* And tell it to shut down */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CheckpointerPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Send SIGUSR2 to Checkpointer
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//pmState = PM_SHUTDOWN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(CheckpointerPID, SIGUSR2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_SHUTDOWN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Failing to start Checkpointer is a serious problem
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					FatalError &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_DEAD_END;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#75715e"&gt;/* Kill the walsenders, archiver and stats collector too */&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Comment says kill walsender, but it actually doesn&amp;#39;t; at least not via SIGQUIT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SignalChildren&lt;/span&gt;(SIGQUIT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgArchPID, SIGQUIT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgStatPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgStatPID, SIGQUIT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//The pmdie function and state machine function won&amp;#39;t create PM_SHUTDOWN_2 state, but reaper will
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//When reaper handles checkpointer exit, it sets pmState = PM_SHUTDOWN_2; at the end of reaper, it enters the state machine function, which is here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_SHUTDOWN_2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * PM_SHUTDOWN_2 state ends when there&amp;#39;s no other children than
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * dead_end children left. There shouldn&amp;#39;t be any regular backends
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * left by now anyway; what we&amp;#39;re really waiting for is walsenders and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * archiver.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_SHUTDOWN_2 essentially waits for walsender and archiver
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//only changes pmState
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CountChildren&lt;/span&gt;(BACKEND_TYPE_ALL) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_WAIT_DEAD_END;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_WAIT_DEAD_END)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_WAIT_DEAD_END means BackendList is completely empty
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;dlist_is_empty&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;BackendList) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; PgStatPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* These other guys should be dead already */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(StartupPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalReceiverPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* syslogger is not considered here */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_NO_CHILDREN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//PM_NO_CHILDREN is the last shutdown state, meaning normal shutdown can proceed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; NoShutdown &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_NO_CHILDREN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (FatalError)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG, (&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;abnormal database system shutdown&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Abnormal pm exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 	&lt;span style="color:#75715e"&gt;//Normal pm exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;reaper&lt;/code&gt; is the process reaping function. When a child process exits, it sends &lt;code&gt;SIGCHLD&lt;/code&gt; to pm, and pm cleans up the process via the &lt;code&gt;reaper&lt;/code&gt; function. Each process type — backend, startup, checkpointer, etc. — has its own cleanup flow.&lt;/p&gt;
&lt;p&gt;Here we only look at checkpointer cleanup. Also, &lt;code&gt;reaper&lt;/code&gt; has no cleanup logic for walsender:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CheckpointerPID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Checkpointer exited normally, and pmState is PM_SHUTDOWN: waiting for checkpoint completion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;EXIT_STATUS_0&lt;/span&gt;(exitstatus) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_SHUTDOWN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * OK, we saw normal exit of the checkpointer after it&amp;#39;s been
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * told to shut down. We expect that it wrote a shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * checkpoint. (If for some reason it didn&amp;#39;t, recovery will
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * occur on next postmaster start.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * At this point we should have no normal backend children
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * left (else we&amp;#39;d not be in PM_SHUTDOWN state) but we might
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * have dead_end children to wait for.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * If we have an archiver subprocess, tell it to do a last
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * archive cycle and quit. Likewise, if we have walsender
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * processes, tell them to send any remaining WAL and quit.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(Shutdown &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; NoShutdown);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;//Wake archiver for the last time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgArchPID, SIGUSR2); &lt;span style="color:#75715e"&gt;//pgarch SIGUSR2=pgarch_waken_stop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Wake walsender for the last time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SignalChildren&lt;/span&gt;(SIGUSR2);&lt;span style="color:#75715e"&gt;//walsender SIGUSR2=WalSndLastCycleHandler
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Here PM_SHUTDOWN_2 is set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//At this point Checkpointer has exited normally; we should wait for pgarch and walsender to finish their last task
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//This is PM_SHUTDOWN_2 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_SHUTDOWN_2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//checkpointer abnormal exit is considered a crash
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;HandleChildCrash&lt;/span&gt;(pid, exitstatus,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#a6e22e"&gt;_&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;checkpointer process&amp;#34;&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//At the end reaper still enters the state machine function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;PostmasterStateMachine&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Checkpointer and Walsender Process Exit
 &lt;div id="checkpointer-and-walsender-process-exit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checkpointer-and-walsender-process-exit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Checkpointer main loop handling requests and shutdown:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckpointerMain&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Loop forever
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		do_checkpoint &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			flags &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;pg_time_t&lt;/span&gt;	now;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			elapsed_secs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			cur_timeout;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Clear any already-pending wakeups */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResetLatch&lt;/span&gt;(MyLatch);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Process any requests or signals received recently.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Process recent sync requests and signals
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;AbsorbSyncRequests&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;HandleCheckpointerInterrupts&lt;/span&gt;();&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer shutdown function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Process any new interrupts.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HandleCheckpointerInterrupts&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ShutdownRequestPending)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * From here on, elog(ERROR) should end with exit(1), not send control
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * back to the sigsetjmp block above
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ExitOnAnyError &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShutdownXLOG&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;span style="color:#75715e"&gt;//This writes the shutdown checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;span style="color:#75715e"&gt;//Normal exit code 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer exit needs to wait for &lt;code&gt;ShutdownXLOG&lt;/code&gt; to complete.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ShutdownXLOG&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This must be called ONCE during postmaster or standalone-backend shutdown
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ShutdownXLOG&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; code, Datum arg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//Here&amp;#39;s the checkpointer &amp;#34;shutting down&amp;#34; log, usually always seen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(IsPostmasterEnvironment &lt;span style="color:#f92672"&gt;?&lt;/span&gt; LOG : NOTICE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;shutting down&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Signal walsenders to move to stopping state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Initialize walsender stopping
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;WalSndInitStopping&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Wait for all walsenders to be in stopping state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;WalSndWaitStopping&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RecoveryInProgress&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CreateRestartPoint&lt;/span&gt;(CHECKPOINT_IS_SHUTDOWN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; CHECKPOINT_IMMEDIATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If archiving is enabled, rotate the last XLOG file so that all the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * remaining records are archived (postmaster wakes up the archiver
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * process one more time at the end of shutdown). The checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * record will go to the next XLOG file and won&amp;#39;t be archived (yet).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XLogArchivingActive&lt;/span&gt;() &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogArchiveCommandSet&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RequestXLogSwitch&lt;/span&gt;(false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//This is the shutdown checkpoint creation function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CreateCheckPoint&lt;/span&gt;(CHECKPOINT_IS_SHUTDOWN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; CHECKPOINT_IMMEDIATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownCLOG&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownCommitTs&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownSUBTRANS&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ShutdownMultiXact&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer notifies all walsenders to begin stopping:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Signal all walsenders to move to stopping state.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This will trigger walsenders to move to a state where no further WAL can be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * generated. See this file&amp;#39;s header for details.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndInitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_wal_senders; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		WalSnd	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;walsnd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;WalSndCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;walsnds[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;pid_t&lt;/span&gt;		pid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		pid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SendProcSignal&lt;/span&gt;(pid, PROCSIG_WALSND_INIT_STOPPING, InvalidBackendId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender receives the signal via the &lt;code&gt;SendProcSignal&lt;/code&gt; function, with signal &lt;code&gt;SIGUSR1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * SendProcSignal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		Send a signal to a Postgres process
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Providing backendId is optional, but it will speed up the operation.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * On success (a signal was sent), zero is returned.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * On error, -1 is returned, and errno is set (typically to ESRCH or EPERM).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Not to be confused with ProcSendSignal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SendProcSignal&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;pid_t&lt;/span&gt; pid, ProcSignalReason reason, BackendId backendId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * BackendId not provided, so search the array using pid. We search
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * the array back to front so as to reduce search overhead. Passing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * InvalidBackendId means that the target is most likely an auxiliary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * process, which will have a slot near the end of the array.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NumProcSignalSlots &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i&lt;span style="color:#f92672"&gt;--&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			slot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ProcSignal&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;psh_slot[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pss_pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; pid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* the above note about race conditions applies here too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Atomically set the proper flag */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pss_signalFlags[reason] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Send signal */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;kill&lt;/span&gt;(pid, SIGUSR1);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ESRCH;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender&amp;rsquo;s &lt;code&gt;SIGUSR1&lt;/code&gt; registration:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR1, procsignal_sigusr1_handler);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pqsignal&lt;/span&gt;(SIGUSR2, WalSndLastCycleHandler);	&lt;span style="color:#75715e"&gt;/* request a last cycle and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;												 * shutdown */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;sigusr1 classifies handling by signal reason:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * procsignal_sigusr1_handler - handle SIGUSR1 signal.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;procsignal_sigusr1_handler&lt;/span&gt;(SIGNAL_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CheckProcSignal&lt;/span&gt;(PROCSIG_WALSND_INIT_STOPPING))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;HandleWalSndInitStopping&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The handler for &lt;code&gt;PROCSIG_WALSND_INIT_STOPPING&lt;/code&gt; is &lt;code&gt;HandleWalSndInitStopping&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Handle PROCSIG_WALSND_INIT_STOPPING signal.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HandleWalSndInitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(am_walsender);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If replication has not yet started, die like with SIGTERM. If
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * replication is active, only set a flag and wake up the main loop. It
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * will send any outstanding WAL, wait for it to be replicated to the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * standby, and then exit gracefully.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;replication_active)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;kill&lt;/span&gt;(MyProcPid, SIGTERM);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		got_STOPPING &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;&lt;span style="color:#75715e"&gt;//If walsender is active, initstopping just sets a flag for the main loop to handle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;main loop&amp;rdquo; mentioned in the comment is somewhat ambiguous. Walsender has a main loop &lt;code&gt;ServerLoop&lt;/code&gt;, but in reality only the loop in &lt;code&gt;WalSndWaitForWal&lt;/code&gt; has checks for &lt;code&gt;got_STOPPING&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;WalSndWaitForWal&lt;/code&gt; function is the main loop for walsender waiting for new WAL records. Since WAL records are initially generated in memory, walwriter flushes them based on certain conditions, not all the time. &lt;code&gt;WalSndWaitForWal&lt;/code&gt; compares the currently sent LSN with the flushed LSN to determine whether new WAL needs to be sent. In other words, unflushed WAL is not transmitted; only flushed WAL is passed downstream.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;WalSndWaitForWal&lt;/code&gt; code segment about stopping:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Wait till WAL &amp;lt; loc is flushed to disk so it can be safely sent to client.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Returns end LSN of flushed WAL. Normally this will be &amp;gt;= loc, but
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * if we detect a shutdown request (either from postmaster or client)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * we will return early, so caller must always check.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; XLogRecPtr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndWaitForWal&lt;/span&gt;(XLogRecPtr loc)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//After receiving got_STOPPING, do one flush of WAL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//This is necessary! Because walwriter may have already shut down at this point, WAL may not be flushed yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (got_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;XLogBackgroundFlush&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Update our idea of the currently flushed position. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;RecoveryInProgress&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			RecentFlushPtr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetFlushRecPtr&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			RecentFlushPtr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetXLogReplayRecPtr&lt;/span&gt;(NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Break out of the for loop
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//After getting new RecentFlushPtr, still need to send
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (got_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* reactivate latch so WalSndLoop knows to continue */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;SetLatch&lt;/span&gt;(MyLatch);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; RecentFlushPtr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Back to walsender main loop: &lt;code&gt;WalSndLoop(XLogSendLogical)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Main loop of walsender process that streams the WAL over Copy messages. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndLoop&lt;/span&gt;(WalSndSendDataCallback send_data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Clear any already-pending wakeups */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResetLatch&lt;/span&gt;(MyLatch);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//Process replies from downstream
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ProcessRepliesIfAny&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If we have received CopyDone from the client, sent CopyDone
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ourselves, and the output buffer is empty, it&amp;#39;s time to exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * streaming.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Exit loop when streaming is done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (streamingDoneReceiving &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; streamingDoneSending &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;pq_is_send_pending&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//If output buffer has pending data, send it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;pq_is_send_pending&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;send_data&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalSndCaughtUp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Try to flush pending output to the client */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;pq_flush_if_writable&lt;/span&gt;() &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;WalSndShutdown&lt;/span&gt;();&lt;span style="color:#75715e"&gt;//Downstream not writable, downstream closed, normal walsender shutdown, exit code 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* If nothing remains to be sent right now ... */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalSndCaughtUp &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;pq_is_send_pending&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * If we&amp;#39;re in catchup state, move to streaming. This is an
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * important state change for users to know about, since before
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * this point data loss might occur if the primary dies and we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * need to failover to the standby. The state change is also
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * important for synchronous replication, since commits that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * started to wait at that point might wait for some time.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Data transmission is done, but commit info still needs to be sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (MyWalSnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;==&lt;/span&gt; WALSNDSTATE_CATCHUP)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(DEBUG1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; has now caught up with upstream server&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								application_name)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;WalSndSetState&lt;/span&gt;(WALSNDSTATE_STREAMING);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Received SIGUSR2, meaning shutdown checkpoint is done.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Send the shutdown checkpoint record, wait for completion, then exit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (got_SIGUSR2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;WalSndDone&lt;/span&gt;(send_data);&lt;span style="color:#75715e"&gt;//exit code 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s return to checkpointer&amp;rsquo;s &lt;code&gt;ShutdownXLOG&lt;/code&gt; logic. The above only analyzed &lt;code&gt;WalSndInitStopping()&lt;/code&gt;. After this signal is sent to walsender, &lt;code&gt;WalSndWaitStopping&lt;/code&gt; executes to wait for walsender.&lt;/p&gt;
&lt;p&gt;As long as any walsender hasn&amp;rsquo;t exited, this is an infinite loop that won&amp;rsquo;t return:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Wait that all the WAL senders have quit or reached the stopping state. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * is used by the checkpointer to control when the shutdown checkpoint can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * safely be performed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndWaitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_wal_senders; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalSnd	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;walsnd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;WalSndCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;walsnds[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; WALSNDSTATE_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* safe to leave if confirmation is done for all WAL senders */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (all_stopped)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pg_usleep&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;10000L&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* wait for 10 msec */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Finally, combined with the comments in walsender.c:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; If the server is shut down, checkpointer sends us
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; PROCSIG_WALSND_INIT_STOPPING after all regular backends have exited. If
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; the backend is idle or runs an SQL query this causes the backend to
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; shutdown, &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; logical replication is in progress all existing WAL records
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; are processed followed by a shutdown. Otherwise this causes the walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; to &lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; to the &lt;span style="color:#e6db74"&gt;&amp;#34;stopping&amp;#34;&lt;/span&gt; state. In this state, the walsender will reject
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; any further replication commands. The checkpointer begins the shutdown
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; checkpoint once all walsenders are confirmed as stopping. When the shutdown
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; checkpoint finishes, the postmaster sends us SIGUSR2. This instructs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; walsender to send any outstanding WAL, including the shutdown checkpoint
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; record, wait &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; it to be replicated to the standby, and then exit.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;After all regular backends have exited, checkpointer sends &lt;code&gt;PROCSIG_WALSND_INIT_STOPPING&lt;/code&gt; to walsenders&lt;/li&gt;
&lt;li&gt;Walsender may enter the stopping state&lt;/li&gt;
&lt;li&gt;Only after all walsenders enter stopping state does checkpointer perform the shutdown checkpoint&lt;/li&gt;
&lt;li&gt;After the shutdown checkpoint completes, pm sends &lt;code&gt;SIGUSR2&lt;/code&gt; to walsender, which sends any remaining WAL including the shutdown checkpoint record itself, waits for standby to complete, then exits&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Shutdown Flow Diagram
 &lt;div id="shutdown-flow-diagram" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shutdown-flow-diagram" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After going through the source code, it felt like I understood but also didn&amp;rsquo;t — needed a shutdown flowchart to clarify.&lt;/p&gt;
&lt;p&gt;Summary of the fast shutdown flow:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/464a8c3e13dd.png" alt="pg fast停库流程.png" /&gt;&lt;/p&gt;
&lt;p&gt;(High resolution: &lt;a href="https://www.processon.com/view/link/6778a73a04a8344b9502637a" target="_blank" rel="noreferrer"&gt;https://www.processon.com/view/link/6778a73a04a8344b9502637a&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG manages shutdown logic through signals, per-process main loops, PM state machine, and the pmdie process reaping function&lt;/li&gt;
&lt;li&gt;Also note: signals themselves are asynchronous. If you need to wait for the result of signal processing in a target process, you typically need other synchronization mechanisms (pipes, semaphores, shared memory, etc.). PG mainly relies on process dependencies and whether processes exit normally to determine if signals were properly handled.&lt;/li&gt;
&lt;li&gt;pgarch and walsender are treated as the same type of process, handled differently from others (walwriter, bgwriter). pgarch and walsender need to do an additional &amp;ldquo;&lt;strong&gt;last task&lt;/strong&gt;&amp;rdquo;. The signal for the &amp;ldquo;&lt;strong&gt;last task&lt;/strong&gt;&amp;rdquo; is typically defined as SIGUSR2.&lt;/li&gt;
&lt;li&gt;Checkpointer&amp;rsquo;s normal exit depends on pgarch and walsender exiting normally.&lt;/li&gt;
&lt;li&gt;pgarch&amp;rsquo;s last task is the final archive. So archiving can affect shutdown.&lt;/li&gt;
&lt;li&gt;Walsender&amp;rsquo;s second-to-last task is delivering the final WAL, and its last task is delivering the checkpoint shutdown info. These tasks require downstream reply messages, so walsender can affect shutdown.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Test Reproduction
 &lt;div id="test-reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Test: Reproducing Walsender Blocking Shutdown
 &lt;div id="test-reproducing-walsender-blocking-shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-reproducing-walsender-blocking-shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After fast stop shutdown, walsender can block the shutdown.&lt;/p&gt;
&lt;p&gt;Tested various scenarios to reproduce walsender blocking shutdown. Currently, the following conditions together make it easier to trigger abnormal shutdown:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One walsender for publication/subscription&lt;/li&gt;
&lt;li&gt;One walsender for DTS&lt;/li&gt;
&lt;li&gt;Large number of subtransactions causing replication slot spill&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This three-in-one scenario doesn&amp;rsquo;t represent the only scenario; it&amp;rsquo;s just one that was easier to reproduce after testing many.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Reproduction commands (not extremely stable reproduction)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;Create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--pg
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpg(id bigserial &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),b char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--oracle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl.lzloracle(id number &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; ,a char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),b char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)) tablespace FADATA;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;Set&lt;/span&gt; up &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; logical replication links (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pub&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; DTS &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; oracle)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.Reduce logical_decoding_work_mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logical_decoding_work_mem&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#66d9ef"&gt;Write&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;large&lt;/span&gt; amounts &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; (recommended: subtransaction spill)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert one row at a time, each insert as a subtransaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#e6db74"&gt;&amp;#34;begin;&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;500000&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;savepoint p$i;&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt;subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;insert into lzlpg(column1,column2,column3) select &amp;#39;a&amp;#39;,&amp;#39;b&amp;#39;,&amp;#39;c&amp;#39;;&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt;subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;done
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nohup psql &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzl &lt;span style="color:#f92672"&gt;-&lt;/span&gt;f subtx.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.Stop the &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;before&lt;/span&gt; writing completes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl stop &lt;span style="color:#f92672"&gt;-&lt;/span&gt;D &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;PGDATA &lt;span style="color:#f92672"&gt;-&lt;/span&gt;m fast&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point, with fast shutdown, the database is in an incomplete shutdown state:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;~/lzl/slot&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ps -axjf|grep &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;150696&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;64964&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;64961&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;146782&lt;/span&gt; pts/42 &lt;span style="color:#ae81ff"&gt;64961&lt;/span&gt; S+ &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 /myhost/postgres/base/rasesql1.5.6/bin/postgres -D /myhost/pg8094/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110599&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110599&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;110599&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: logger 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: checkpointer 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117807&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117807&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117807&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: stats collector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;118563&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;118563&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;118563&lt;/span&gt; ? -1 Rs &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 3:29 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: walsender lzl 127.0.0.1&lt;span style="color:#f92672"&gt;(&lt;/span&gt;62971&lt;span style="color:#f92672"&gt;)&lt;/span&gt; idle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;110402&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;222918&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;222918&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;222918&lt;/span&gt; ? -1 Rs &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 2:59 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: walsender dtssync 30.181.46.203&lt;span style="color:#f92672"&gt;(&lt;/span&gt;57218&lt;span style="color:#f92672"&gt;)&lt;/span&gt; idle&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender, checkpointer, postmaster are all still there; logger and stats haven&amp;rsquo;t exited either.&lt;/p&gt;
&lt;p&gt;The control file state is &lt;code&gt;in production&lt;/code&gt;: meaning running in production, indicating the local shutdown checkpoint by checkpointer didn&amp;rsquo;t complete:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;~/lzl/slot&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pg_controldata|grep -i state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database cluster state: in production&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Checkpointer stack:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pstack &lt;span style="color:#ae81ff"&gt;117803&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00002b879fe0b983 in __select_nocancel () from /lib64/libc.so.6
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00000000008fd04a in pg_usleep (microsec=microsec@entry=10000) at pgsleep.c:56
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00000000007610c8 in WalSndWaitStopping () at walsender.c:3209
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x000000000051fa86 in ShutdownXLOG (code=code@entry=0, arg=arg@entry=0) at xlog.c:8596
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x00000000007215ff in HandleCheckpointerInterrupts () at checkpointer.c:566
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 CheckpointerMain () at checkpointer.c:343
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point, checkpointer is stuck in &lt;code&gt;WalSndWaitStopping&lt;/code&gt;, meaning checkpointer is waiting for walsender processes to enter stopping state.&lt;/p&gt;
&lt;p&gt;Walsender stack at this point:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00000000007484fb in ReorderBufferLargestTXN (rb=&amp;lt;optimized out&amp;gt;) at reorderbuffer.c:2345
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 ReorderBufferCheckMemoryLimit (rb=0x2b8808b94118) at reorderbuffer.c:2390
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 ReorderBufferQueueChange (rb=0x2b8808b94118, xid=&amp;lt;optimized out&amp;gt;, lsn=1676456602544, change=change@entry=0x2b87a229f408) at reorderbuffer.c:649
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x000000000073ec99 in DecodeTruncate (buf=&amp;lt;optimized out&amp;gt;, buf=&amp;lt;optimized out&amp;gt;, ctx=&amp;lt;optimized out&amp;gt;) at decode.c:872
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 DecodeHeapOp (buf=0x7ffda7d35180, ctx=0x2b87a224b118) at decode.c:455
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 LogicalDecodingProcessRecord (ctx=0x2b87a224b118, record=&amp;lt;optimized out&amp;gt;) at decode.c:126
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x000000000075f502 in XLogSendLogical () at walsender.c:2886
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x0000000000761822 in WalSndLoop (send_data=send_data@entry=0x75f4c0 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2287
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender is stuck in the transaction spill function. (&lt;em&gt;Why it&amp;rsquo;s stuck is still unclear!!!&lt;/em&gt;)&lt;/p&gt;
&lt;p&gt;Checkpointer process is blocked in &lt;code&gt;WalSndWaitStopping&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Wait that all the WAL senders have quit or reached the stopping state. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * is used by the checkpointer to control when the shutdown checkpoint can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * safely be performed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WalSndWaitStopping&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_wal_senders; i&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WalSnd	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;walsnd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;WalSndCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;walsnds[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; WALSNDSTATE_STOPPING)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				all_stopped &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;walsnd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;mutex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* safe to leave if confirmation is done for all WAL senders */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (all_stopped)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pg_usleep&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;10000L&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* wait for 10 msec */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the code and stack, it&amp;rsquo;s clear the condition &lt;code&gt;walsnd-&amp;gt;state != WALSNDSTATE_STOPPING&lt;/code&gt; is hit, causing the infinite loop.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Handling the Mid-Shutdown State
 &lt;div id="test-handling-the-mid-shutdown-state" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-handling-the-mid-shutdown-state" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The above is an awkward mid-shutdown state. Besides &lt;code&gt;kill -9&lt;/code&gt;, there are other better ways to achieve consistent shutdown:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Solution 1: Shut down the downstream process&lt;/li&gt;
&lt;li&gt;Solution 2: Send &lt;code&gt;SIGTERM&lt;/code&gt; to walsender&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solution 1 test:&lt;/p&gt;
&lt;p&gt;When the downstream exits, walsender will also exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ProcessRepliesIfAny&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * &amp;#39;X&amp;#39; means that the standby is closing down the socket.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;X&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For pub/sub, execute the following on the subscriber side; even if the upstream is in mid-shutdown state, this will cause walsender to exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; SUBSCRIPTION sub_lzl disable;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However, this depends on the downstream&amp;rsquo;s own handling; we can&amp;rsquo;t always quickly shut down the downstream receiver process of DTS and other sync tools.&lt;/p&gt;
&lt;p&gt;Solution 2 test:&lt;/p&gt;
&lt;p&gt;Since walsender registers the &lt;code&gt;SIGTERM&lt;/code&gt; signal, and the &lt;code&gt;select pg_terminate_backend($walsender_pid)&lt;/code&gt; command run while the database is running also sends &lt;code&gt;SIGTERM&lt;/code&gt; to walsender, theoretically just sending &lt;code&gt;SIGTERM&lt;/code&gt; to walsender should handle this, without needing &lt;code&gt;kill -9&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Command:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kill &lt;span style="color:#f92672"&gt;-&lt;/span&gt;SIGTERM &lt;span style="color:#ae81ff"&gt;62834&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;same &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; kill &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;62834&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;same &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; kill &lt;span style="color:#ae81ff"&gt;62834&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After normal kill, pm and all other processes exit completely.&lt;/p&gt;
&lt;p&gt;Check the control file and WAL log to confirm consistent shutdown:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;pg_controldata database state changed from &lt;code&gt;in production&lt;/code&gt; to &lt;code&gt;shut down&lt;/code&gt; — consistent shutdown:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_controldata|grep -i state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database cluster state: shut down&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;The last record in the WAL log is &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt;:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump 000000010000018600000012|tail -1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump: fatal: error in WAL record at 186/915D7920: invalid record length at 186/915D7998: wanted 24, got &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 114/ 114, tx: 0, lsn: 186/915D7920, prev 186/915D78A8, desc: CHECKPOINT_SHUTDOWN redo 186/915D7920; tli 1; prev tli 1; fpw true; xid 0:13431045; oid 3808147; multi 3; offset 6; oldest xid &lt;span style="color:#ae81ff"&gt;485&lt;/span&gt; in DB 1; oldest multi &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; in DB 1; oldest/newest commit timestamp xid: 494/13431044; oldest running xid 0; shutdown&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Test: Reproducing Only Primary Having CHECKPOINT_SHUTDOWN
 &lt;div id="test-reproducing-only-primary-having-checkpoint_shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-reproducing-only-primary-having-checkpoint_shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A phenomenon in the production environment was that the local WAL had a shutdown checkpoint but the standby didn&amp;rsquo;t. In production, an immediate stop was performed during mid-shutdown, and then startup failed.&lt;/p&gt;
&lt;p&gt;At the time, the last 2 WAL records on primary and standby looked something like:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Primary WAL:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CHECKPOINT_ONLINE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CHECKPOINT_SHUTDOWN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Standby WAL:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CHECKPOINT_ONLINE&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Reproduction commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 1. First reproduce walsender blocking shutdown&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;skipped&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 2. Check the last WAL record&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 188/307ABE00, prev 188/307ABDC8, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;13432444&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 3. pg_ctl stop -D $PGDATA -m i&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 4. Check last WAL record&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Unchanged, same as &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 5. pg_ctl start -D $PGDATA&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## 6. Check last two WAL records&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: Standby len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 50/ 50, tx: 0, lsn: 188/307ABE00, prev 188/307ABDC8, desc: RUNNING_XACTS nextXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt; latestCompletedXid &lt;span style="color:#ae81ff"&gt;13432444&lt;/span&gt; oldestRunningXid &lt;span style="color:#ae81ff"&gt;13432445&lt;/span&gt; &lt;span style="color:#75715e"&gt;#same as 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: XLOG len &lt;span style="color:#f92672"&gt;(&lt;/span&gt;rec/tot&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 114/ 114, tx: 0, lsn: 188/307ABE38, prev 188/307ABE00, desc: CHECKPOINT_SHUTDOWN redo 188/307ABE38; tli 1; prev tli 1; fpw true; xid 0:13432445; oid 3832732; multi 3; offset 6; oldest xid &lt;span style="color:#ae81ff"&gt;485&lt;/span&gt; in DB 1; oldest multi &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; in DB 1; oldest/newest commit timestamp xid: 494/13432444; oldest running xid 0; shutdown &lt;span style="color:#75715e"&gt;#CHECKPOINT_SHUTDOWN appears&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From this reproduction, &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt; is actually done during &lt;strong&gt;startup&lt;/strong&gt;!&lt;/p&gt;
&lt;p&gt;This matches the production sequence: 1. fast shutdown didn&amp;rsquo;t complete 2. immediate shutdown 3. startup failed.&lt;/p&gt;
&lt;p&gt;Question 1: When during startup is CHECKPOINT_SHUTDOWN done?&lt;/p&gt;
&lt;p&gt;Question 2: When is CHECKPOINT_ONLINE triggered? From reproduction appearances, occasionally fast shutdown results in the last WAL record being CHECKPOINT_ONLINE.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Question 1 analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Doing a shutdown checkpoint at startup easily suggests the startup process. Since we&amp;rsquo;ve previously analyzed the startup process flow, we can directly locate the function &lt;code&gt;StartupXLOG&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This must be called ONCE during postmaster or standalone-backend startup
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;StartupXLOG&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (InRecovery) &lt;span style="color:#75715e"&gt;//Since it was a shutdown stop, instance recovery is needed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Perform a checkpoint to update all our recovery activity to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Note that we write a shutdown checkpoint rather than an on-line
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * one. This is not particularly critical, but since we may be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * assigning a new TLI, using a shutdown checkpoint allows us to have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * the rule that TLI only changes in shutdown checkpoints, which
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * allows some extra error checking in xlog_redo.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * In fast promotion, only create a lightweight end-of-recovery record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * instead of a full checkpoint. A checkpoint is requested later,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * after we&amp;#39;re fully out of recovery mode and already accepting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * queries.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (bgwriterLaunched) &lt;span style="color:#75715e"&gt;//This if is clearly for standby streaming replication
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;//Primary startup goes here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;CreateCheckPoint&lt;/span&gt;(CHECKPOINT_END_OF_RECOVERY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; CHECKPOINT_IMMEDIATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Doing a shutdown checkpoint is intentional, mainly for TLI logic code robustness&lt;/li&gt;
&lt;li&gt;Whenever it&amp;rsquo;s not a consistent shutdown, a shutdown checkpoint is performed during startup&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, doing &lt;code&gt;-m i&lt;/code&gt; forced shutdown and then starting up will also produce &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt; — self-tested.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Question 2 analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Tested multiple times, occasionally seen. Speculation: it just happened that before shutdown, checkpoint conditions were met and an online checkpoint was triggered — pure coincidence.&lt;/p&gt;
&lt;p&gt;Considering that after a failed database shutdown, whether it&amp;rsquo;s a script, HA, or manual intervention, forced shutdown may be done, it&amp;rsquo;s recommended to do at least one checkpoint before shutdown.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Test: Impact of Archiving on Shutdown
 &lt;div id="test-impact-of-archiving-on-shutdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#test-impact-of-archiving-on-shutdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;While analyzing the shutdown code, I also found that after the checkpointer process exits, reaper for checkpointer sends &lt;code&gt;SIGUSR2&lt;/code&gt; to pgarch for its last archive and exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;reaper&lt;/span&gt;(SIGNAL_ARGS)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CheckpointerPID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;EXIT_STATUS_0&lt;/span&gt;(exitstatus) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_SHUTDOWN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* Waken archiver for the last time */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgArchPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;signal_child&lt;/span&gt;(PgArchPID, SIGUSR2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;And pm&amp;rsquo;s exit depends on all processes except syslogger having exited:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pmState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PM_WAIT_DEAD_END)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;dlist_is_empty&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;BackendList) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; PgStatPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* These other guys should be dead already */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(StartupPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalReceiverPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* syslogger is not considered here */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_NO_CHILDREN;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So in production, slow archiving was also found to affect shutdown.&lt;/p&gt;
&lt;p&gt;Reproduction commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;Configure archiving
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;archive_mode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;archive_command &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;/bin/false ;sleep 1000&amp;#39;&lt;/span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;Set&lt;/span&gt; archiving &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; always fail &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; sleep &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; bypass NUM_ARCHIVE_RETRIES logic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;Shutdown
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_ctl stop &lt;span style="color:#f92672"&gt;-&lt;/span&gt;D &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;PGDATA &lt;span style="color:#f92672"&gt;-&lt;/span&gt;m fast&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Processes after shutdown:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ps -axjf|grep &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;72200&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88406&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88405&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;68705&lt;/span&gt; pts/48 &lt;span style="color:#ae81ff"&gt;88405&lt;/span&gt; S+ &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; grep --color&lt;span style="color:#f92672"&gt;=&lt;/span&gt;auto &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 /myhost/postgres/base/rasesql1.5.6/bin/postgres -D /myhost/pg8094/data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61772&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61772&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61772&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: logger 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;61470&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63880&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63880&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63880&lt;/span&gt; ? -1 Ss &lt;span style="color:#ae81ff"&gt;6001&lt;/span&gt; 0:00 &lt;span style="color:#ae81ff"&gt;\_&lt;/span&gt; postgres: lzlpg: archiver archiving &lt;span style="color:#ae81ff"&gt;000000010000018800000007&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since the checkpointer here has already fully stopped, the database is in a consistent state, so using &lt;code&gt;kill -9&lt;/code&gt; on archiver is fine.&lt;/p&gt;

&lt;h2 class="relative group"&gt;One-Sentence Summary
 &lt;div id="one-sentence-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#one-sentence-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Q1: Why didn&amp;rsquo;t shutdown complete?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Walsender blocked shutdown. Checkpointer sent SIGUSR1 to walsender and infinitely waited for all walsender processes to enter stopping state; checkpointer got stuck at this step.&lt;/p&gt;
&lt;p&gt;The shutdown eventually completed due to &lt;code&gt;-m i&lt;/code&gt; forced shutdown.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q2: Is there a graceful way to shut down from the mid-shutdown state caused by walsender blocking?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Yes. Send &lt;code&gt;SIGTERM&lt;/code&gt; (i.e. &lt;code&gt;kill&lt;/code&gt;, or &lt;code&gt;kill -15&lt;/code&gt;, &lt;code&gt;kill -SIGTERM&lt;/code&gt;) to all walsenders. Afterwards, checkpointer and postmaster will complete a clean shutdown.&lt;/p&gt;
&lt;p&gt;Walsender registers the &lt;code&gt;SIGTERM&lt;/code&gt; signal at startup, and testing shows no scenario where it can&amp;rsquo;t be handled.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SIGTERM&lt;/code&gt; is also the signal sent by &lt;code&gt;pg_terminate_backend(pid)&lt;/code&gt;, and it&amp;rsquo;s the command that should be executed to stop walsender during a standard shutdown.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q3: Why did primary and standby differ by exactly one &lt;code&gt;shutdown checkpoint&lt;/code&gt;?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;3.1 Explanation for both primary and standby having &lt;code&gt;CHECKPOINT_ONLINE&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The primary triggering &lt;code&gt;CHECKPOINT_ONLINE&lt;/code&gt; was purely coincidental&lt;/li&gt;
&lt;li&gt;Since the physical walsender was still there, this WAL record was transmitted to the standby&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;3.2 Explanation for only primary having &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;This &lt;code&gt;CHECKPOINT_SHUTDOWN&lt;/code&gt; was done during primary startup&lt;/li&gt;
&lt;li&gt;Since the primary hadn&amp;rsquo;t fully started, this WAL record wasn&amp;rsquo;t transmitted to the standby&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Q4: Why does archiver block shutdown?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When reaping the checkpointer process, pm tells archiver to do one last archive, and pm depends on all processes except syslogger having exited. So if the last archive is slow or has issues, it blocks shutdown. Archive failure won&amp;rsquo;t — the archiver process exits quickly on failure.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Q5: Which processes can block shutdown?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Actually, any process not exiting can block shutdown. The question is which ones are more likely to cause trouble. From the shutdown code flow, archiver and walsender commonly block shutdown because they perform a last archive or log transmission during the shutdown phase.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/server-shutdown.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/server-shutdown.html&lt;/a&gt;
&lt;a href="https://wiki.postgresql.org/wiki/Signals" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Signals&lt;/a&gt;
postgres.c
postmaster.c
walsender.c
xlog.c
checkpointer.c
startup.c
pgarch.c&lt;/p&gt;</content:encoded></item><item><title>PG Startup Logic and Spill-Caused Slow Startup Analysis</title><link>https://lastdba.com/en/2025/01/04/pg-startup-logic-and-spill-caused-slow-startup-analysis/</link><pubDate>Sat, 04 Jan 2025 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2025/01/04/pg-startup-logic-and-spill-caused-slow-startup-analysis/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptom — Slow Startup
 &lt;div id="problem-symptom--slow-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptom--slow-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Version: PG 13.2&lt;/p&gt;
&lt;p&gt;Database startup was slow. The startup process was reading spill files, and the filenames kept changing. Checking the spill files was also very slow — &lt;code&gt;ls -l&lt;/code&gt; eventually showed 8 million spill files.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Tens of Millions of Spill Files?
 &lt;div id="why-tens-of-millions-of-spill-files" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-tens-of-millions-of-spill-files" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;WAL Segment and LSN Meaning
 &lt;div id="wal-segment-and-lsn-meaning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal-segment-and-lsn-meaning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;LSN
 &lt;div id="lsn" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsn" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;LSN is a 64-bit bigint. An LSN actually looks like &lt;code&gt;42D3B/1732C540&lt;/code&gt; (hex). Before the slash &lt;code&gt;/&lt;/code&gt; is the 32-bit logical log number, and after the &lt;code&gt;/&lt;/code&gt; are 32 bits split into segment number + block number + intra-block offset. These 4 parts are:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptom — Slow Startup
 &lt;div id="problem-symptom--slow-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptom--slow-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Version: PG 13.2&lt;/p&gt;
&lt;p&gt;Database startup was slow. The startup process was reading spill files, and the filenames kept changing. Checking the spill files was also very slow — &lt;code&gt;ls -l&lt;/code&gt; eventually showed 8 million spill files.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Tens of Millions of Spill Files?
 &lt;div id="why-tens-of-millions-of-spill-files" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-tens-of-millions-of-spill-files" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;WAL Segment and LSN Meaning
 &lt;div id="wal-segment-and-lsn-meaning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal-segment-and-lsn-meaning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;LSN
 &lt;div id="lsn" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsn" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;LSN is a 64-bit bigint. An LSN actually looks like &lt;code&gt;42D3B/1732C540&lt;/code&gt; (hex). Before the slash &lt;code&gt;/&lt;/code&gt; is the 32-bit logical log number, and after the &lt;code&gt;/&lt;/code&gt; are 32 bits split into segment number + block number + intra-block offset. These 4 parts are:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;th&gt;8 bits&lt;/th&gt;
 &lt;th&gt;11 bits&lt;/th&gt;
 &lt;th&gt;13 bits&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Logical log number&lt;/td&gt;
 &lt;td&gt;Log segment number&lt;/td&gt;
 &lt;td&gt;Block number&lt;/td&gt;
 &lt;td&gt;Intra-block offset&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Intra-block offset 8192 = 2^13&lt;/p&gt;
&lt;p&gt;Block number = 16M (default WAL segment size) / 8192&lt;/p&gt;

&lt;h4 class="relative group"&gt;WAL Segment
 &lt;div id="wal-segment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wal-segment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;A WAL filename consists of 3 groups of hex digits.&lt;/p&gt;
&lt;p&gt;Taking the 8k WAL file &lt;code&gt;0000000300042D3B00000002&lt;/code&gt; as example:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;th&gt;32 bits&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;timeline&lt;/td&gt;
 &lt;td&gt;Logical log number&lt;/td&gt;
 &lt;td&gt;Log segment number&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;00000003&lt;/td&gt;
 &lt;td&gt;00042D3B&lt;/td&gt;
 &lt;td&gt;00000002&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It can be seen that an LSN can locate a WAL filename and the offset position within the file.&lt;/p&gt;
&lt;p&gt;Among these, the part before the LSN slash &lt;code&gt;/&lt;/code&gt; is the logical log number, and the 8-bit log segment number after the slash &lt;code&gt;/&lt;/code&gt; will be used below.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Spill Filename Conversion
 &lt;div id="spill-filename-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-filename-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Replication slot name: logical_ex2209_rep&lt;/p&gt;
&lt;p&gt;Spill filename: xid-407989064-lsn-42D1E-20000000.spill&lt;/p&gt;
&lt;p&gt;42D1E is not a complete LSN and cannot be directly used with &lt;code&gt;pg_walfile_name&lt;/code&gt; to locate a WAL filename. 42D1E is a logical log number. If we directly filter WAL files containing 42D1E in the name, we find 16 WAL files.&lt;/p&gt;
&lt;p&gt;Can we locate the WAL log segment number from the number 20000000 to pinpoint the exact file?&lt;/p&gt;
&lt;p&gt;Spill filename generation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Given a replication slot, transaction ID and segment number, fill in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * corresponding spill file into &amp;#39;path&amp;#39;, which is a caller-owned buffer of size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * at least MAXPGPATH.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializedPath&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;path, ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, TransactionId xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; XLogSegNo segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; XLogRecPtr recptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;XLogSegNoOffsetToRecPtr&lt;/span&gt;(segno, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, wal_segment_size, recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(path, MAXPGPATH, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s/xid-%u-lsn-%X-%X.spill&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(MyReplicationSlot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (uint32) (recptr &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;pg_replslot/%s&lt;/code&gt; and &lt;code&gt;xid-%u-lsn&lt;/code&gt; parts are easy to understand — just the replication slot name and xid. The &lt;code&gt;recptr&lt;/code&gt; needs a closer look at its definition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Pointer to a location in the XLOG. These pointers are 64 bits wide,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * because we don&amp;#39;t want them ever to overflow.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; uint64 XLogRecPtr;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;XLogSegNoOffsetToRecPtr&lt;/code&gt; calculates the LSN from the WAL log segment number, segment size, and offset:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XLogSegNoOffsetToRecPtr(segno, offset, wal_segsz_bytes, dest) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; (dest) = (segno) * (wal_segsz_bytes) + (offset)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;XLogRecPtr is the LSN! So:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;(uint32) (recptr &amp;gt;&amp;gt; 32)&lt;/code&gt; takes the first 32 bits of LSN, &lt;code&gt;(uint32) recptr)&lt;/code&gt; takes the last 32 bits.&lt;/p&gt;
&lt;p&gt;The first 32 bits of LSN is what we saw as the first half of LSN, lsn-42D1E. The last 32 bits of LSN actually contain more information; here we only need the first few bits of the last 32 bits — the segment number.&lt;/p&gt;
&lt;p&gt;Since the passed-in offset=0 and we also have segno, we don&amp;rsquo;t actually need the intra-block offset information to calculate the dest value. The real value of wal_segsz_bytes is 128M = 128*1024*1024. Converting the formula in &lt;code&gt;XLogSegNoOffsetToRecPtr&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;segno&lt;span style="color:#f92672"&gt;=&lt;/span&gt; dest&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Convert hex 20000000
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;segno&lt;span style="color:#f92672"&gt;=&lt;/span&gt; x&lt;span style="color:#e6db74"&gt;&amp;#39;20000000&amp;#39;&lt;/span&gt;::int&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;segno&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From this formula we can derive the log segment number segno, which lets us locate the WAL file number.&lt;/p&gt;
&lt;p&gt;So the spill filename xid-407989064-lsn-42D1E-20000000.spill corresponds to the WAL file:&lt;/p&gt;
&lt;p&gt;Logical log number=42D1E, segment number=04:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls 42D1E*04
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000200042D1E00000004&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;pg_waldump shows xid 407989064 inside.&lt;/p&gt;
&lt;p&gt;In practice, the WAL size is also fixed after instance creation, i.e. (128*1024*1024) is a constant, so segno is absolutely correlated with (uint32) recptr, but not equal to it. This means that switching to a new WAL log file creates a new spill file.&lt;/p&gt;
&lt;p&gt;Summary of &lt;strong&gt;spill file generation rules&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Same transaction id: if it spans multiple WAL files, it produces multiple spills. E.g., a large transaction without subtransactions spanning 3 WAL files produces 3 spill files.&lt;/li&gt;
&lt;li&gt;Different transaction ids produce different spills. E.g., 10 million subtransactions produce 10 million spill files.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Spill filename structure xid-407989064-lsn-42D1E-20000000.spill:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;xid&lt;/th&gt;
 &lt;th&gt;First 32 bits of LSN; i.e., WAL logical log number&lt;/th&gt;
 &lt;th&gt;Converted from WAL log segment number; not equal to segment number&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;xid-407989064&lt;/td&gt;
 &lt;td&gt;lsn-42D1E&lt;/td&gt;
 &lt;td&gt;20000000&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Recovered environment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |head -100
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;40000276&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;184&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 15:20 state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;196&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:25 xid-407989064-lsn-42D1E-0.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;208&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:25 xid-407989064-lsn-42D1E-20000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;540&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 16:44 xid-407989064-lsn-42D2A-D0000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989065-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989066-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989068-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989070-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989072-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989076-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989079-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989080-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 13:09 xid-407989082-lsn-42D1D-C8000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzlhost /myhost/liuzhilong/pg_replslot/logical_ex9e15_rep&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $9}&amp;#39;&lt;/span&gt;|awk -F &lt;span style="color:#e6db74"&gt;&amp;#39;-&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;{print $2}&amp;#39;&lt;/span&gt;|sort|uniq -c|wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;10000003&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzlhost /myhost/liuzhilong/pg_replslot/logical_ex9e15_rep&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;10000070&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So in the actual environment we saw 10,000,070 files, with 10,000,003 distinct xids among them — meaning 1 parent transaction spanning about 70 WAL files, with this parent transaction having 10 million subtransactions.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Replication Slot Spill Testing
 &lt;div id="replication-slot-spill-testing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replication-slot-spill-testing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Pub/sub replication link setup
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logical_decoding_work_mem &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;MB &lt;span style="color:#f92672"&gt;#&lt;/span&gt;pg_ctl reload
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_segment_size &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--source
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; replication_table (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id BIGSERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column1 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column2 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column3 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; publication pub_test &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; replication_table ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--dest
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; replication_table (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id BIGSERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column1 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column2 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column3 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SUBSCRIPTION sub_test
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CONNECTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;host=127.0.0.1 port=8094 dbname=lzl user=lzl password=qwer&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PUBLICATION pub_test;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--source
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Large Transaction, No Subtransactions, Replicated Table Spill Test
 &lt;div id="large-transaction-no-subtransactions-replicated-table-spill-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-transaction-no-subtransactions-replicated-table-spill-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create a large transaction, don&amp;#39;t commit yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; replication_table(column1,column2,column3) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;c&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Replication slot spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;331924&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:22 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 88226964 Dec 9 20:22 xid-5074343-lsn-163-38000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 119698488 Dec 9 20:22 xid-5074343-lsn-163-40000000.spill&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After the large transaction commits, wait for consumption until replication lag is 0, and the spill files disappear:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag,flush_lag,replay_lag,reply_time &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sent_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; write_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replay_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; write_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flush_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replay_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reply_time 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+--------------+--------------+--------------+--------------+-----------+-----------+------------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;163525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;163&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4996&lt;/span&gt;E1C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14769&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,pg_wal_lsn_diff(pg_current_wal_lsn(),sent_lsn) diff_sent_mb,pg_wal_lsn_diff(pg_current_wal_lsn(),write_lsn) diff_write_mb,pg_wal_lsn_diff(pg_current_wal_lsn(),flush_lsn) diff_flush_mb,pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) diff_replay_mb,pg_walfile_name_offset(sent_lsn) sentoffset,pg_walfile_name_offset(write_lsn) writeoffset,pg_walfile_name_offset(flush_lsn) flush_lsn &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_sent_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_write_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_flush_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; diff_replay_mb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sentoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; writeoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flush_lsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+--------------+---------------+---------------+----------------+-------------------------------------+-------------------------------------+-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;163525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000016300000009&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;26665416&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000016300000009&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;26665416&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#f92672"&gt;/&lt;/span&gt;mypg&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg8094&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;357392&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:23 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 88226964 Dec 9 20:22 xid-5074343-lsn-163-38000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 137696328 Dec 9 20:23 xid-5074343-lsn-163-40000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 26076708 Dec 9 20:23 xid-5074343-lsn-163-48000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#f92672"&gt;/&lt;/span&gt;mypg&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg8094&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:25 state2666
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Large Transaction, No Subtransactions, Non-Replicated Table Spill Test
 &lt;div id="large-transaction-no-subtransactions-non-replicated-table-spill-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-transaction-no-subtransactions-non-replicated-table-spill-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--source: create an unrelated table for writing data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; no_replication_table (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id BIGSERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column1 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column2 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; column3 char(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create a large transaction, don&amp;#39;t commit yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; no_replication_table(column1,column2,column3) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;c&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzldb:MYINST:&lt;span style="color:#ae81ff"&gt;8094&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt;mypg&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg8094&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;sub_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;357492&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 184 Dec 9 20:09 state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 107511456 Dec 9 20:08 xid-5074106-lsn-163-28000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 137698804 Dec 9 20:09 xid-5074106-lsn-163-30000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 4308444 Dec 9 20:09 xid-5074106-lsn-163-38000000.spill&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Large Transaction, Subtransactions, Non-Replicated Table Spill Test
 &lt;div id="large-transaction-subtransactions-non-replicated-table-spill-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#large-transaction-subtransactions-non-replicated-table-spill-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## One insert per row, each insert as one subtransaction&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo &lt;span style="color:#e6db74"&gt;&amp;#34;begin;&amp;#34;&lt;/span&gt;&amp;gt;subtx.sql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i in &lt;span style="color:#f92672"&gt;{&lt;/span&gt;1..1000000&lt;span style="color:#f92672"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;savepoint p&lt;/span&gt;$i&lt;span style="color:#e6db74"&gt;;&amp;#34;&lt;/span&gt;&amp;gt;&amp;gt;subtx.sql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; echo &lt;span style="color:#e6db74"&gt;&amp;#34;insert into no_replication_table(column1,column2,column3) select &amp;#39;a&amp;#39;,&amp;#39;b&amp;#39;,&amp;#39;c&amp;#39;;&amp;#34;&lt;/span&gt;&amp;gt;&amp;gt;subtx.sql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nohup psql -d lzl -f subtx.sql &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#During execution, observed 800k+ spill files&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;/myhost/pg8094/data/pg_replslot/sub_test&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;823749&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;/myhost/pg8094/data/pg_replslot/sub_test&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll |head -10
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;1099532&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;184&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:10 state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;1236&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:10 xid-5519686-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519687-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519688-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519689-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519690-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519691-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519692-lsn-163-70000000.spill
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt; Dec &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 xid-5519693-lsn-163-70000000.spill&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Analysis of Slow Database Startup
 &lt;div id="analysis-of-slow-database-startup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis-of-slow-database-startup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Startup Process Startup Flow Analysis
 &lt;div id="startup-process-startup-flow-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#startup-process-startup-flow-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Here we parse the startup flow frame by frame using the call stack:&lt;/p&gt;
&lt;p&gt;11: &lt;code&gt;main&lt;/code&gt;: Nothing to say.&lt;/p&gt;
&lt;p&gt;10: &lt;code&gt;PostmasterMain&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Before the main loop, it first calls the startup flow &lt;code&gt;StartupPID = StartupDataBase();&lt;/code&gt; which essentially calls &lt;code&gt;StartChildProcess(StartupProcess)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define StartupDataBase()		StartChildProcess(StartupProcess)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;9: &lt;code&gt;StartChildProcess&lt;/code&gt;: Forks a process. This process is the auxiliary process for starting postmaster; normal child process startup goes through this logic, forking at this step. The input &lt;code&gt;AuxProcType&lt;/code&gt;=StartupProcess.&lt;/p&gt;
&lt;p&gt;8: &lt;code&gt;AuxiliaryProcessMain&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Since &lt;code&gt;MyAuxProcType&lt;/code&gt;=StartupProcess, it goes through the &lt;code&gt;StartupProcessMain&lt;/code&gt; flow, which is different from child processes like &lt;strong&gt;walsender&lt;/strong&gt;, walwriter, bgwriter. The startup process itself exists for crash recovery WAL reading, but it does many other things:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (MyAuxProcType)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; CheckerProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, they&amp;#39;re useless here */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;CheckerModeMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; BootstrapProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * There was a brief instant during which mode was Normal; this is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * okay. We need to be in bootstrap mode during BootStrapXLOG for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * the sake of multixact initialization.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetProcessingMode&lt;/span&gt;(BootstrapProcessing);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;bootstrap_signals&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;BootStrapXLOG&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;BootstrapModeMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; StartupProcess: &lt;span style="color:#75715e"&gt;//Here here here here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, startup process has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;StartupProcessMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; BgWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, bgwriter has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;BackgroundWriterMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; CheckpointerProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, checkpointer has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;CheckpointerMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, walwriter has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;InitXLOGAccess&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;WalWriterMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalReceiverProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* don&amp;#39;t set signals, walreceiver has its own agenda */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;WalReceiverMain&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);		&lt;span style="color:#75715e"&gt;/* should never return */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;default&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(PANIC, &lt;span style="color:#e6db74"&gt;&amp;#34;unrecognized process type: %d&amp;#34;&lt;/span&gt;, (&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;) MyAuxProcType);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;proc_exit&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;7: &lt;code&gt;StartupProcessMain&lt;/code&gt;: Mainly to call &lt;code&gt;StartupXLOG()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;6: &lt;code&gt;StartupXLOG&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Function comment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;This must be called ONCE during postmaster or standalone&lt;span style="color:#f92672"&gt;-&lt;/span&gt;backend startup&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;StartupXLOG&lt;/code&gt; is always called by postmaster regardless, whether crash shutdown or consistent shutdown:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; DB_IN_PRODUCTION:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;database system was interrupted; last known up at %s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#a6e22e"&gt;str_time&lt;/span&gt;(ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;time))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This matches the log output. Here&amp;rsquo;s the shutdown and startup log:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:02:57.534 CST,,,447560,,65693cde.6d448,1325,,2023-12-01 09:54:38 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system is shut down&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,1,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;ending log output to stderr&amp;#34;&lt;/span&gt;,,&lt;span style="color:#e6db74"&gt;&amp;#34;Future log output will go to log destination &amp;#34;&amp;#34;csvlog&amp;#34;&amp;#34;.&amp;#34;&lt;/span&gt;,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.536 CST,,,211844,,6752bdf3.33b84,2,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;starting PostgreSQL 13.2 (RaseSQL 1.3) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39.0.1), 64-bit&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.537 CST,,,211844,,6752bdf3.33b84,3,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;listening on IPv4 address &amp;#34;&amp;#34;0.0.0.0&amp;#34;&amp;#34;, port 7284&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.539 CST,,,211844,,6752bdf3.33b84,4,,2024-12-06 17:03:47 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;listening on Unix socket &amp;#34;&amp;#34;/tmp/.s.PGSQL.7284&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-12-06 17:03:49.557 CST,,,211995,,6752bdf5.33c1b,1,,2024-12-06 17:03:49 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;database system was interrupted; last known up at 2024-12-06 17:00:10 CST&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;startup&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So, after shutdown, the control file recorded the database state as &lt;code&gt;in production&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database cluster state: in production&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;in production&lt;/code&gt; state means &lt;strong&gt;the database is running&lt;/strong&gt;, not a normal shutdown state — indicating that at the time of shutdown, it was &lt;strong&gt;not a consistent shutdown&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Continuing with the key code about fsync:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If we previously crashed, perform a couple of actions:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * - The pg_wal directory may still include some temporary WAL segments
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * used when creating a new segment, so perform some clean up to not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * bloat this path. This is done first as there is no point to sync
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * this temporary data.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * - There might be data which we had written, intending to fsync it, but
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * which we had not actually fsync&amp;#39;d yet. Therefore, a power failure in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the near future might cause earlier unflushed writes to be lost, even
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * though more recent data written to disk from here on would be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * persisted. To avoid that, fsync the entire data directory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; DB_SHUTDOWNED &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ControlFile&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; DB_SHUTDOWNED_IN_RECOVERY)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RemoveTempXlogFiles&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SyncDataDirectory&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Here, because the control file state is not a normal shutdown, it enters the if-block and calls &lt;code&gt;SyncDataDirectory()&lt;/code&gt; for fsync persistence.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;StartupXLOG&lt;/code&gt; does many many things. Among those related to spill, besides &lt;code&gt;SyncDataDirectory()&lt;/code&gt;, there&amp;rsquo;s also &lt;code&gt;StartupReorderBuffer()&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Initialize replication slots, before there&amp;#39;s a chance to remove
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * required resources.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;StartupReplicationSlots&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Startup logical state, needs to be setup now so we have proper data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * during crash recovery.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;StartupReorderBuffer&lt;/span&gt;();&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;StartupReorderBuffer&lt;/code&gt; is also called. It calls &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; to clean up spill files in all slot directories (but does not delete directories or state files):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Delete all data spilled to disk after we&amp;#39;ve restarted/crashed. It will be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * recreated when the respective slots are reused.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;StartupReorderBuffer&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DIR		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_dir;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dirent &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_de;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	logical_dir &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocateDir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((logical_de &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReadDir&lt;/span&gt;(logical_dir, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;..&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* if it cannot be a slot, skip the directory */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotValidateName&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, DEBUG2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ok, has to be a surviving logical slot, iterate and delete
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * everything starting with xid-*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferCleanupSerializedTXNs&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;FreeDir&lt;/span&gt;(logical_dir);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;5: &lt;code&gt;SyncDataDirectory&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;The function comment is very important:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Issue fsync recursively on PGDATA and all its contents.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We fsync regular files and directories wherever they are, but we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * follow symlinks only for pg_wal and immediately under pg_tblspc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Other symlinks are presumed to point at files we&amp;#39;re not responsible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * for fsyncing, and might not have privileges to write at all.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Errors are logged but not considered fatal; that&amp;#39;s because this is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * only during database startup, to deal with the possibility that there are
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * issued-but-unsynced writes pending against the data directory. We want to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * ensure that such writes reach disk before anything that&amp;#39;s done in the new
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * run. However, aborting on error would result in failure to start for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * harmless cases such as read-only files in the data directory, and that&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * not good either.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Note that if we previously crashed due to a PANIC on fsync(), we&amp;#39;ll be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * rewriting all changes again during recovery.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Note we assume we&amp;#39;re chdir&amp;#39;d into PGDATA to begin with.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;fsync all data directory files to persist them&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;This action only happens during the startup phase&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;This action ensures the data directory is fully persistent before the database starts running&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The body of &lt;code&gt;SyncDataDirectory&lt;/code&gt; recursively walks directories and fsyncs (with some special handling for symlinks):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;walkdir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;, datadir_fsync_fname, false, LOG);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (xlog_is_symlink)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;walkdir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_wal&amp;#34;&lt;/span&gt;, datadir_fsync_fname, false, LOG);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;walkdir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_tblspc&amp;#34;&lt;/span&gt;, datadir_fsync_fname, true, LOG);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;4: &lt;code&gt;walkdir&lt;/code&gt;: Recurse to &lt;code&gt;.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;3: &lt;code&gt;walkdir&lt;/code&gt;: Recurse to &lt;code&gt;./pg_replslot&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;2: &lt;code&gt;walkdir&lt;/code&gt;: Recurse to &lt;code&gt;./pg_replslot/slotname&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;1: &lt;code&gt;lstat&lt;/code&gt;: C library call. &lt;code&gt;walkdir&lt;/code&gt; not only does fsync (via the callback &lt;code&gt;datadir_fsync_fname&lt;/code&gt;), the &lt;code&gt;walkdir&lt;/code&gt; function body also does &lt;code&gt;lstat&lt;/code&gt; to get file info such as inode, file size, last modification time, etc. — similar to the Linux &lt;code&gt;stat&lt;/code&gt; command.&lt;/p&gt;
&lt;p&gt;0: &lt;code&gt;_lxstat&lt;/code&gt;: C library call.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Startup logic summary&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG starts an auxiliary process &lt;code&gt;startup&lt;/code&gt; to help with startup. Unlike common child processes (walwriter, bgwriter, checkpointer, etc.), it&amp;rsquo;s always started during the startup process and does many things.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;StartupXLOG&lt;/code&gt; is always called during startup, whether or not the database was consistently shut down.&lt;/li&gt;
&lt;li&gt;Only in a non-normal shutdown state does &lt;code&gt;SyncDataDirectory&lt;/code&gt; get triggered.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SyncDataDirectory&lt;/code&gt; fsyncs all data files for persistence and checks stat info for all data files.&lt;/li&gt;
&lt;li&gt;fsync ensures data file consistency before startup; stat is probably to verify files are normal and readable (before the startup process starts, only the readability of the datadir directory was verified).&lt;/li&gt;
&lt;li&gt;Regardless of shutdown state, &lt;code&gt;StartupReorderBuffer&lt;/code&gt; is always called and cleans up spill files for all replication slots.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;When Is the Ready State?
 &lt;div id="when-is-the-ready-state" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#when-is-the-ready-state" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After the startup process finishes its work, the database is not yet in ready state. When the pmState state machine changes state, the &lt;code&gt;reaper&lt;/code&gt; process reaping function is called. The reaper function itself does some recovery or startup work after a child process exits. The pmState state machine records the state as PM_STARTUP, which controls the startup/shutdown state.&lt;/p&gt;
&lt;p&gt;Last steps of &lt;code&gt;PostmasterMain&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	StartupPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartupDataBase&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(StartupPID &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	StartupStatus &lt;span style="color:#f92672"&gt;=&lt;/span&gt; STARTUP_RUNNING;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_STARTUP; &lt;span style="color:#75715e"&gt;//State machine changes state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Some workers may be scheduled to start now */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;maybe_start_bgworkers&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	status &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ServerLoop&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * ServerLoop probably shouldn&amp;#39;t ever return, but if it does, close down.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(status &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; STATUS_OK);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;abort&lt;/span&gt;();					&lt;span style="color:#75715e"&gt;/* not reached */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The core startup flow of &lt;code&gt;PostmasterMain&lt;/code&gt; goes to &lt;code&gt;reaper&lt;/code&gt; to handle the normal exit of the startup process.&lt;/p&gt;
&lt;p&gt;PMState comment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We use a simple state machine to control startup, shutdown, and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * crash recovery (which is rather like shutdown followed by startup).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * After doing all the postmaster initialization work, we enter PM_STARTUP
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * state and the startup process is launched. The startup process begins by
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * reading the control file and other preliminary initialization steps.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In a normal startup, or after crash recovery, the startup process exits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * with exit code 0 and we switch to PM_RUN state. 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PMState is passed and processed via signals. After the startup process exits, &lt;code&gt;reaper&lt;/code&gt; is activated to reap the process.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;reaper&lt;/code&gt; function handling the startup child process&amp;rsquo;s normal exit:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; StartupPID)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			StartupPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Startup succeeded, commence normal operations
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			StartupStatus &lt;span style="color:#f92672"&gt;=&lt;/span&gt; STARTUP_NOT_RUNNING; &lt;span style="color:#75715e"&gt;//Transition from STARTUP_RUNNING to STARTUP_NOT_RUNNING
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			FatalError &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false; &lt;span style="color:#75715e"&gt;//After none of the above ifs are hit, it&amp;#39;s not fatal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			AbortStartTime &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			ReachedNormalRunning &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			pmState &lt;span style="color:#f92672"&gt;=&lt;/span&gt; PM_RUN; &lt;span style="color:#75715e"&gt;//State machine transitions from PM_STARTUP to PM_RUN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			connsAllowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ALLOW_ALL_CONNS;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Crank up the background tasks, if we didn&amp;#39;t do that already
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * when we entered consistent recovery state. It doesn&amp;#39;t matter
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * if this fails, we&amp;#39;ll just try again later.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Below: starting core child processes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CheckpointerPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				CheckpointerPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartCheckpointer&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (BgWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				BgWriterPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartBackgroundWriter&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WalWriterPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				WalWriterPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartWalWriter&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Likewise, start other special children as needed. In a restart
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * situation, some of them may be alive already.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Below: starting non-core child processes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;IsBinaryUpgrade &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AutoVacuumingActive&lt;/span&gt;() &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; AutoVacPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				AutoVacPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;StartAutoVacLauncher&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;PgArchStartupAllowed&lt;/span&gt;() &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; PgArchPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				PgArchPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;pgarch_start&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (PgStatPID &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				PgStatPID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;pgstat_start&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* workers may be scheduled to start now */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;maybe_start_bgworkers&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 &lt;span style="color:#75715e"&gt;//At this point it&amp;#39;s officially ready to accept connections
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* at this point we are really open for business */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;database system is ready to accept connections&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Report status */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;AddToDataDirLockFile&lt;/span&gt;(LOCK_FILE_LINE_PM_STATUS, PM_STATUS_READY);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#ifdef USE_SYSTEMD
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;sd_notify&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;READY=1&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#endif
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &amp;ldquo;database system is ready to accept connections&amp;rdquo; message is right here.&lt;/p&gt;
&lt;p&gt;Checkpointer, bgwriter, walwriter, autovacuum, arch (if present), stats — all these processes need to be started. At this stage, launching these processes doesn&amp;rsquo;t have to return success; they can be retried later in &lt;code&gt;ServerLoop&lt;/code&gt; or on the next &lt;code&gt;reaper&lt;/code&gt; execution. Only the startup process must start and complete all related tasks in one shot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pid &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* in parent, fork failed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; save_errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (type)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; StartupProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork startup process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; BgWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork background writer process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; CheckpointerProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork checkpointer process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalWriterProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork WAL writer process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; WalReceiverProcess:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork WAL receiver process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;default&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(LOG,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not fork process: %m&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * fork failure is fatal during startup, but there&amp;#39;s no need to choke
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * immediately if starting other child types fails.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (type &lt;span style="color:#f92672"&gt;==&lt;/span&gt; StartupProcess)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ExitPostmaster&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Spill File Generation Logic Across Versions
 &lt;div id="spill-file-generation-logic-across-versions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-file-generation-logic-across-versions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Spill in all versions spills the largest transaction. Here we focus on when spilling happens.&lt;/p&gt;
&lt;p&gt;PG12: pg12 hard-codes 4096 changes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; Size max_changes_in_memory &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Check whether the transaction tx should spill its data to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCheckSerializeTXN&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * TODO: improve accounting so we cheaply can take subtransactions into
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * account here.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nentries_mem &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; max_changes_in_memory)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nentries_mem &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG13: Spills when exceeding &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; memory size:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCheckMemoryLimit&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;size &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; logical_decoding_work_mem &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024L&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Pick the largest transaction (or subtransaction) and evict it from
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * memory by serializing it to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		txn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReorderBufferLargestTXN&lt;/span&gt;(rb);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PG14: Adds streaming transfer &lt;code&gt;ReorderBufferStreamTXN&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCheckMemoryLimit&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;size &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; logical_decoding_work_mem &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024L&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Pick the largest transaction (or subtransaction) and evict it from
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * memory by streaming, if possible. Otherwise, spill to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;ReorderBufferCanStartStreaming&lt;/span&gt;(rb) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(txn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReorderBufferLargestTopTXN&lt;/span&gt;(rb)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReorderBufferStreamTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(rb, txn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although PG14 has streaming replication, triggering it requires certain conditions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Returns true, if the streaming can be started now, false, otherwise. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;inline&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCanStartStreaming&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	LogicalDecodingContext &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ctx &lt;span style="color:#f92672"&gt;=&lt;/span&gt; rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;private_data;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ctx&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;snapshot_builder;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* We can&amp;#39;t start streaming unless a consistent state is reached. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;SnapBuildCurrentState&lt;/span&gt;(builder) &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; SNAPBUILD_CONSISTENT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * We can&amp;#39;t start streaming immediately even if the streaming is enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * because we previously decoded this transaction and now just are
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * restarting.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;ReorderBufferCanStream&lt;/span&gt;(rb) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildXactNeedsSkip&lt;/span&gt;(builder, ctx&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;reader&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;EndRecPtr))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Found a point after SNAPBUILD_FULL_SNAPSHOT where all transactions that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * were running at that point finished. Till we reach that we hold off
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * calling any commit callbacks.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SNAPBUILD_CONSISTENT &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additional streaming trigger conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Condition 1: All transactions covered by the snapshot have completed (presumably committed or rolled back)&lt;/li&gt;
&lt;li&gt;Condition 2: The context is private data (does this mean two links to one table won&amp;rsquo;t trigger streaming?)&lt;/li&gt;
&lt;li&gt;Condition 3: Transactions in the snapshot are not skippable (probably some special transactions can be skipped)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PG15: Similar to 14, just cleaner functions with less nesting.&lt;/p&gt;
&lt;p&gt;PG16: About the same.&lt;/p&gt;
&lt;p&gt;PG17: About the same, adds &lt;code&gt;DEBUG_LOGICAL_REP_STREAMING_IMMEDIATE&lt;/code&gt; to force streaming.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key points to remember:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PG12 and earlier: hard-coded 4096 changes&lt;/li&gt;
&lt;li&gt;PG13: adds &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; parameter, allowing memory size adjustment to reduce spill probability&lt;/li&gt;
&lt;li&gt;PG14 and later: supports streaming replication&lt;/li&gt;
&lt;li&gt;Triggering streaming also requires certain conditions, so even with streaming, spills can still happen&lt;/li&gt;
&lt;li&gt;PG17: adds &lt;code&gt;debug_logical_replication_streaming&lt;/code&gt; parameter to force streaming&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Spill File Cleanup Logic
 &lt;div id="spill-file-cleanup-logic" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spill-file-cleanup-logic" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Startup-time spill cleanup is just one scenario. There&amp;rsquo;s also walsender startup cleanup and drop slot cleanup.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Walsender Startup Cleanup
 &lt;div id="walsender-startup-cleanup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#walsender-startup-cleanup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; is called during database startup (before walsender has started) and during walsender startup (while the database is running). Note these are different scenarios, though they call the same function. From the function comment, it&amp;rsquo;s meant to &amp;ldquo;remove leftover serialized reorder buffers&amp;rdquo; — i.e., clean up spill files.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Remove any leftover serialized reorder buffers from a slot directory after a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * prior crash or decoding session exit.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferCleanupSerializedTXNs&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slotname)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DIR		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;spill_dir;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dirent &lt;span style="color:#f92672"&gt;*&lt;/span&gt;spill_de;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; stat statbuf;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		path[MAXPGPATH &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s&amp;#34;&lt;/span&gt;, slotname);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* we&amp;#39;re only handling directories here, skip if it&amp;#39;s not ours */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;lstat&lt;/span&gt;(path, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;statbuf) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;S_ISDIR&lt;/span&gt;(statbuf.st_mode))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	spill_dir &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocateDir&lt;/span&gt;(path);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((spill_de &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReadDirExtended&lt;/span&gt;(spill_dir, path, INFO)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* only look at names that can be ours */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//Only compare first 3 characters
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strncmp&lt;/span&gt;(spill_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;xid&amp;#34;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(path, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(path),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s/%s&amp;#34;&lt;/span&gt;, slotname,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 spill_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;unlink&lt;/span&gt;(path) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(ERROR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not remove file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; during removal of pg_replslot/%s/xid*: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								path, slotname)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;FreeDir&lt;/span&gt;(spill_dir);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Two things to note about the above cleanup logic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cleans files whose names start with &amp;ldquo;xid&amp;rdquo;. Obviously, the state file is not cleaned.&lt;/li&gt;
&lt;li&gt;Uses unlink to clean, one file at a time. This can help us devise a faster startup scheme.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Database Startup Cleanup
 &lt;div id="database-startup-cleanup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#database-startup-cleanup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;During database startup, a startup process is forked to clean slots. The cleanup function is the same one walsender calls: &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;One more difference: after walsender restarts, it only cleans spills for the current slot with the same name; whereas during database startup, all slot spills are cleaned sequentially.&lt;/p&gt;
&lt;p&gt;Database startup process, while-loop sequential cleanup logic:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;StartupReorderBuffer&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DIR		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_dir;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; dirent &lt;span style="color:#f92672"&gt;*&lt;/span&gt;logical_de;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	logical_dir &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocateDir&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((logical_de &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ReadDir&lt;/span&gt;(logical_dir, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{	&lt;span style="color:#75715e"&gt;//Exclude . and ..
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;.&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;strcmp&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, &lt;span style="color:#e6db74"&gt;&amp;#34;..&amp;#34;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;//Validate slot name
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* if it cannot be a slot, skip the directory */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotValidateName&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name, DEBUG2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * ok, has to be a surviving logical slot, iterate and delete
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * everything starting with xid-*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ReorderBufferCleanupSerializedTXNs&lt;/span&gt;(logical_de&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;d_name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;FreeDir&lt;/span&gt;(logical_dir);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The while loop calls &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt;, and after that, the logic is the same as walsender startup cleanup.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Manual Cleanup via pg_drop_replication_slot
 &lt;div id="manual-cleanup-via-pg_drop_replication_slot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#manual-cleanup-via-pg_drop_replication_slot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The drop slot cleanup logic is &lt;strong&gt;different&lt;/strong&gt; from the automatic spill file cleanup — it does not call &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Drop slot flow:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_drop_replication_slot(PG_FUNCTION_ARGS)&lt;/code&gt; -&amp;gt; &lt;code&gt;ReplicationSlotDrop(const char *name, bool nowait)&lt;/code&gt; -&amp;gt; &lt;code&gt;ReplicationSlotDropAcquired(void)&lt;/code&gt; -&amp;gt; &lt;code&gt;ReplicationSlotDropPtr&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ReplicationSlotDropPtr&lt;/code&gt;&amp;rsquo;s slot cleanup logic is also interesting:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Permanently drop the replication slot which will be released by the point
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * this function returns.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotDropPtr&lt;/span&gt;(ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		path[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		tmppath[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If some other backend ran this code concurrently with us, we might try
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * to delete a slot with a certain name while someone else was trying to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * create a slot with the same name.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(ReplicationSlotAllocationLock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Generate pathnames. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s&amp;#34;&lt;/span&gt;, &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(tmppath, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s.tmp&amp;#34;&lt;/span&gt;, &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Rename the slot directory on disk, so that we&amp;#39;ll no longer recognize
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * this as a valid slot. Note that if this fails, we&amp;#39;ve got to mark the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * slot inactive before bailing out. If we&amp;#39;re dropping an ephemeral or a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * temporary slot, we better never fail hard as the caller won&amp;#39;t expect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the slot to survive and this might get called during error handling.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;rename&lt;/span&gt;(path, tmppath) &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#75715e"&gt;//rename file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * We need to fsync() the directory we just renamed and its parent to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * make sure that our changes are on disk in a crash-safe fashion. If
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * fsync() fails, we can&amp;#39;t be sure whether the changes are on disk or
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * not. For now, we handle that by panicking;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * StartupReplicationSlots() will try to straighten it out after
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * restart.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;//fsync persistence
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;START_CRIT_SECTION&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(tmppath, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot&amp;#34;&lt;/span&gt;, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;END_CRIT_SECTION&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If removing the directory fails, the worst thing that will happen is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * that the user won&amp;#39;t be able to create a new slot with the same name
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * until the next server restart. We warn about it, but that&amp;#39;s all.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;rmtree&lt;/span&gt;(tmppath, true))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(WARNING,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not remove directory &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;, tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * We release this at the very end, so that nobody starts trying to create
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * a slot while we&amp;#39;re still cleaning up the detritus of the old one.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(ReplicationSlotAllocationLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Drop slot doesn&amp;rsquo;t directly unlink files in the slot directory. Instead, it first renames the &lt;code&gt;slotname/&lt;/code&gt; directory to &lt;code&gt;slotname.tmp/&lt;/code&gt;, then unlinks the files inside, and finally removes the &lt;code&gt;slotname.tmp/&lt;/code&gt; directory itself.&lt;/p&gt;
&lt;p&gt;In this, rmtree also loops to unlink files.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Accelerated Startup Scheme After Replication Slot Spill
 &lt;div id="accelerated-startup-scheme-after-replication-slot-spill" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#accelerated-startup-scheme-after-replication-slot-spill" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Deleting 10 million spill files is obviously very slow, but directly moving the directory (&lt;code&gt;mv&lt;/code&gt;) is extremely fast. However, direct &lt;code&gt;mv&lt;/code&gt; requires attention to the name after the move and the state file, as well as knowing which source code step the &lt;code&gt;mv&lt;/code&gt; bypasses.&lt;/p&gt;

&lt;h3 class="relative group"&gt;mv Naming Notes
 &lt;div id="mv-naming-notes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mv-naming-notes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since it was an abnormal shutdown, the startup process will execute &lt;code&gt;SyncDataDirectory&lt;/code&gt; to fsync and stat all data files — this is hard to bypass. After &lt;code&gt;SyncDataDirectory&lt;/code&gt; completes, it starts handling replication slots. When handling slots, it calls &lt;code&gt;StartupReorderBuffer()&lt;/code&gt; -&amp;gt; &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; to fully clean up spill files.&lt;/p&gt;
&lt;p&gt;Before entering cleanup, &lt;code&gt;ReplicationSlotValidateName&lt;/code&gt; validates the slot name. We can exploit &lt;code&gt;ReplicationSlotValidateName&lt;/code&gt; to trick the startup process into skipping the &lt;code&gt;ReorderBufferCleanupSerializedTXNs&lt;/code&gt; process.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ReplicationSlotValidateName&lt;/code&gt; rules:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReplicationSlotValidateName&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;name, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; elevel)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (cp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; name; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp; cp&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{ &lt;span style="color:#75715e"&gt;//Key rule here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;((&lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;z&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#f92672"&gt;||&lt;/span&gt; (&lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;9&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#f92672"&gt;||&lt;/span&gt; (&lt;span style="color:#f92672"&gt;*&lt;/span&gt;cp &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;_&amp;#39;&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_INVALID_NAME),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;replication slot name &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt; contains invalid character&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Replication slot names may only contain lower case letters, numbers, and the underscore character.&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Valid slot names only contain &lt;code&gt;a-z&lt;/code&gt;, &lt;code&gt;0-9&lt;/code&gt;, &lt;code&gt;_&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So when renaming, it&amp;rsquo;s recommended to add a dot &lt;code&gt;.&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Recommended&lt;/em&gt;: &lt;code&gt;slotname.bak&lt;/code&gt;, &lt;code&gt;slotname.20241215&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Not recommended&lt;/em&gt;: &lt;code&gt;slotnamebackup&lt;/code&gt;, &lt;code&gt;slotname20241215&lt;/code&gt;, &lt;code&gt;slotname_bak&lt;/code&gt;, etc.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Not recommended&lt;/em&gt;: &lt;code&gt;.tmp&lt;/code&gt; suffix — slot names with &lt;code&gt;.tmp&lt;/code&gt; have special meaning.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After renaming, you need to create the directory and copy the state file, otherwise the slot will behave strangely on startup (e.g., duplicate slot names, auto-generated slot names, inability to delete slots, downstream unable to start the replication link, etc.).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Recommended mv operations summarized:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd pg_replslot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mv slotname slotname.bak 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mkdir slotname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cp slotname.bak/state slotname/&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Startup Time Comparison
 &lt;div id="startup-time-comparison" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#startup-time-comparison" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Compare startup speed across different source code flows to see if manual mv/rm acceleration is actually meaningful.&lt;/p&gt;
&lt;p&gt;Reference source logic principles:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Normal shutdown: goes through fsync and stat&lt;/li&gt;
&lt;li&gt;Abnormal shutdown: goes through fsync and stat&lt;/li&gt;
&lt;li&gt;Valid mv: rename slot directory to &lt;code&gt;.bak&lt;/code&gt;, skip unlink&lt;/li&gt;
&lt;li&gt;Invalid mv: rename slot directory to &lt;code&gt;_bak&lt;/code&gt;, spill files start with xid, goes through unlink&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since actual spill files would be too slow, I manually created fake slot directories and spill files: 50 slots total, 400k spills per slot, 20 million spills total, to test startup time (using &lt;code&gt;cp&lt;/code&gt; directory is much faster than &lt;code&gt;cp&lt;/code&gt; or &lt;code&gt;dd&lt;/code&gt; files).&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;#&lt;/th&gt;
 &lt;th&gt;Test Plan&lt;/th&gt;
 &lt;th&gt;Startup Time&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;Normal shutdown; no fsync/stat, no unlink&lt;/td&gt;
 &lt;td&gt;0.1 seconds&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;Normal shutdown, invalid mv; no fsync/stat, unlink&lt;/td&gt;
 &lt;td&gt;11 min 41 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;Abnormal shutdown, valid mv; fsync/stat, no unlink&lt;/td&gt;
 &lt;td&gt;4 min 35 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;Abnormal shutdown, invalid mv; fsync/stat, unlink&lt;/td&gt;
 &lt;td&gt;32 min 2 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;Abnormal shutdown, rm (create slot dir, keep state)&lt;/td&gt;
 &lt;td&gt;13 min 4 sec&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Comparing plans 3 and 5, theoretically in the scenario at hand, a valid mv could achieve startup in about 4 minutes, while rm would take about 13 minutes. (This is a rough comparison; the recovery environment already showed some differences.)&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Designing Data-Intensive Applications (2nd Edition)</title><link>https://lastdba.com/en/2024/09/20/book-notes-designing-data-intensive-applications-2nd-edition/</link><pubDate>Fri, 20 Sep 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/09/20/book-notes-designing-data-intensive-applications-2nd-edition/</guid><description>&lt;p&gt;DDIA-v2 Chinese edition: &lt;a href="https://github.com/Vonng/ddia/tree/v2" target="_blank" rel="noreferrer"&gt;https://github.com/Vonng/ddia/tree/v2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After finishing DDIA-v2, I couldn&amp;rsquo;t put it down. Everything data-related is explained with such clarity — why is it like this? What&amp;rsquo;s the current state? What problems does this have? The observations and ideas are incredibly incisive and concise. Even the nautical-chart-style diagrams at the start of each chapter are fascinating.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: This article is essentially a collection of excerpts from the original work, with almost none of my own thoughts or ideas. I&amp;rsquo;ve simply plucked out the parts I love most. Some knowledge I&amp;rsquo;ve already mastered and some topics too remote are skipped!&lt;/em&gt;&lt;/p&gt;</description><content:encoded>&lt;p&gt;DDIA-v2 Chinese edition: &lt;a href="https://github.com/Vonng/ddia/tree/v2" target="_blank" rel="noreferrer"&gt;https://github.com/Vonng/ddia/tree/v2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;After finishing DDIA-v2, I couldn&amp;rsquo;t put it down. Everything data-related is explained with such clarity — why is it like this? What&amp;rsquo;s the current state? What problems does this have? The observations and ideas are incredibly incisive and concise. Even the nautical-chart-style diagrams at the start of each chapter are fascinating.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: This article is essentially a collection of excerpts from the original work, with almost none of my own thoughts or ideas. I&amp;rsquo;ve simply plucked out the parts I love most. Some knowledge I&amp;rsquo;ve already mastered and some topics too remote are skipped!&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch1: Trade-offs in Data System Architecture
 &lt;div id="ch1-trade-offs-in-data-system-architecture" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch1-trade-offs-in-data-system-architecture" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;OLTP &amp;amp; OLAP
 &lt;div id="oltp--olap" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oltp--olap" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The distinction between OLTP and analytics is not always clear-cut, but the following table lists some typical characteristics:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Attribute&lt;/th&gt;
 &lt;th&gt;Transactional Systems (OLTP)&lt;/th&gt;
 &lt;th&gt;Analytical Systems (OLAP)&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Primary read pattern&lt;/td&gt;
 &lt;td&gt;Point queries (fetch individual records by key)&lt;/td&gt;
 &lt;td&gt;Aggregation over a large number of records&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Primary write pattern&lt;/td&gt;
 &lt;td&gt;Create, update, and delete individual records&lt;/td&gt;
 &lt;td&gt;Bulk import (ETL) or event stream&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Human user example&lt;/td&gt;
 &lt;td&gt;End users of web/mobile applications&lt;/td&gt;
 &lt;td&gt;Internal analysts, for decision support&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Machine use example&lt;/td&gt;
 &lt;td&gt;Check whether an action is authorized&lt;/td&gt;
 &lt;td&gt;Detect fraud/abuse patterns&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Query type&lt;/td&gt;
 &lt;td&gt;Fixed set of queries, predefined by the application&lt;/td&gt;
 &lt;td&gt;Analysts can issue arbitrary queries&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Data representation&lt;/td&gt;
 &lt;td&gt;Latest state of data (current point in time)&lt;/td&gt;
 &lt;td&gt;History of events over time&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Dataset size&lt;/td&gt;
 &lt;td&gt;GB, TB&lt;/td&gt;
 &lt;td&gt;TB, PB&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A &lt;em&gt;data warehouse&lt;/em&gt; is a separate database where analysts can query freely without affecting OLTP operations. Data warehouses typically store data in a very different way from OLTP databases, optimized for the query types common in analytics.
The process of getting data into the data warehouse is called &lt;em&gt;Extract–Transform–Load&lt;/em&gt; (ETL).
Some database systems offer &lt;em&gt;Hybrid Transaction/Analytical Processing&lt;/em&gt; (HTAP), aiming to enable both OLTP and analytics in a single system without ETL from one system to another.
&lt;strong&gt;Despite the existence of HTAP, the separation between transactional and analytical systems remains common&lt;/strong&gt; due to their differing goals and requirements. In particular, it is considered good practice for each business system to have its own database, resulting in hundreds of independent operational databases; on the other hand, an enterprise typically has only one data warehouse, allowing business analysts to combine data from several business systems in a single query.
A &lt;em&gt;data lake&lt;/em&gt; is a centralized data repository that holds any data potentially useful for analysis, sourced from business systems through ETL processes. Unlike a data warehouse, a data lake contains only files and imposes no specific file format or data model. Data warehouses typically use the &lt;em&gt;relational&lt;/em&gt; data model and are queried via SQL.
A &lt;em&gt;data lakehouse&lt;/em&gt; goes beyond a standalone data warehouse by enabling typical data warehouse workloads (SQL queries and business analytics) as well as data science/machine learning workloads to run directly on files in the data lake. This architecture is called a &lt;em&gt;data lakehouse&lt;/em&gt;. It requires a query execution engine and a metadata (e.g., schema management) layer to extend the file storage of the data lake. Apache Hive, Spark SQL, Presto, and Trino are examples of this approach.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Cloud Services vs. Self-Hosting
 &lt;div id="cloud-services-vs-self-hosting" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cloud-services-vs-self-hosting" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The pros and cons of cloud services:
Using cloud services, rather than running comparable software yourself, is essentially outsourcing the operation of that software to a cloud provider. There are strong arguments both for and against using cloud services.&lt;/p&gt;
&lt;p&gt;Advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When you use the cloud, you still need an operations team, but outsourcing basic system administration can &lt;strong&gt;free your team to focus on higher-level problems&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cloud services are especially valuable if your system load varies significantly over time&lt;/strong&gt;. If you provision machines to handle peak load but those computing resources sit idle most of the time, your system becomes less cost-effective.&lt;/li&gt;
&lt;li&gt;Compared to physical machines, &lt;strong&gt;cloud instances can be provisioned faster and come in a wider variety of sizes&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Disadvantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The biggest drawback of cloud services is that you have no control over them.&lt;/li&gt;
&lt;li&gt;If you already have experience setting up and operating the required systems and your load is fairly predictable (i.e., the number of machines you need won&amp;rsquo;t fluctuate dramatically), it is typically cheaper to buy your own machines and run the software yourself.&lt;/li&gt;
&lt;li&gt;If the service lacks a feature you need, your only option is to politely ask the vendor whether they&amp;rsquo;ll add it; you usually can&amp;rsquo;t implement it yourself.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;If the service goes down, you can only wait for it to recover&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;If you use the service in a way that triggers a bug or causes performance issues, &lt;strong&gt;it&amp;rsquo;s very difficult to diagnose the problem&lt;/strong&gt;. With software you run yourself, you can obtain performance metrics and debugging information from the business system to understand its behavior, and you can inspect server logs. But with vendor-hosted services, &lt;strong&gt;you typically don&amp;rsquo;t have access to this internal information&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Moreover, if the service shuts down or becomes unacceptably expensive, or if the vendor decides to change its product in a way you don&amp;rsquo;t like, you&amp;rsquo;re at their mercy — continuing to run an old version of the software is usually not an option, so you&amp;rsquo;ll be forced to migrate to another service. This risk can be mitigated if there are alternative services offering compatible APIs, but for many cloud services, there is no standard API, which increases switching costs and makes &lt;strong&gt;vendor lock-in&lt;/strong&gt; a real problem.&lt;/li&gt;
&lt;li&gt;Latency-critical applications such as high-frequency trading require complete control over hardware, making the cloud a poor choice for such businesses.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Cloud-Native
 &lt;div id="cloud-native" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cloud-native" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Category&lt;/th&gt;
 &lt;th&gt;Self-Hosted Systems&lt;/th&gt;
 &lt;th&gt;Cloud-Native Systems&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Transactional/OLTP&lt;/td&gt;
 &lt;td&gt;MySQL, PostgreSQL, MongoDB&lt;/td&gt;
 &lt;td&gt;AWS Aurora, Azure SQL DB Hyperscale, Google Cloud Spanner&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Analytical/OLAP&lt;/td&gt;
 &lt;td&gt;Teradata, ClickHouse, Spark&lt;/td&gt;
 &lt;td&gt;Snowflake, Google BigQuery, Azure Synapse Analytics&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The key idea behind cloud-native services is not only to use computing resources managed by the business system but also to build on top of lower-level cloud services to create higher-level services. For example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Object storage&lt;/em&gt; services like Amazon S3, Azure Blob Storage, and Cloudflare R2 store large files. They provide a more limited API than a typical filesystem (basic file reads and writes), but their advantage is hiding the underlying physical machines: the service automatically distributes data across many machines, so you don&amp;rsquo;t need to worry about running out of disk space on any single machine. Even if some machines or their disks fail entirely, no data is lost.&lt;/li&gt;
&lt;li&gt;Many other services are in turn built on top of object storage and other cloud services: for example, Snowflake is a cloud-based analytical database (data warehouse) that relies on S3 for data storage, and some services are further built on top of Snowflake.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cloud-native systems are typically &lt;em&gt;multi-tenant&lt;/em&gt;, meaning they don&amp;rsquo;t provision separate machines for each customer. Instead, data and computation from several different customers are handled by the same service on shared hardware. Multi-tenancy enables better hardware utilization, easier scalability, and simpler management for cloud providers.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Operations in the Cloud Era
 &lt;div id="operations-in-the-cloud-era" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#operations-in-the-cloud-era" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Traditionally, the people managing an organization&amp;rsquo;s server-side data infrastructure were called database administrators (DBAs) or system administrators (sysadmins). In recent years, many organizations have attempted to integrate software development and operations roles into a single team jointly responsible for backend services and data infrastructure; the &lt;em&gt;DevOps&lt;/em&gt; philosophy has guided this trend. &lt;em&gt;Site Reliability Engineers&lt;/em&gt; (SREs) represent Google&amp;rsquo;s implementation of this philosophy.&lt;/p&gt;
&lt;p&gt;The DevOps/SRE philosophy emphasizes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Automation&lt;/strong&gt; — preferring repeatable processes over one-off manual tasks,&lt;/li&gt;
&lt;li&gt;Preferring ephemeral virtual machines and services over long-running servers,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Promoting frequent application updates&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Learning from incidents&lt;/strong&gt;,&lt;/li&gt;
&lt;li&gt;Preserving organizational knowledge about systems even as individual personnel come and go.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The operations team at an infrastructure company focuses on the details of reliably delivering services to a large number of customers, while the customers of the service spend as little time and energy on infrastructure as possible. Beyond the traditional need for capacity planning, adopting cloud services may be easier and faster than running your own infrastructure. &lt;strong&gt;While the cloud is changing the role of operations, the need for operations remains urgent.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch2: Defining Non-Functional Requirements
 &lt;div id="ch2-defining-non-functional-requirements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch2-defining-non-functional-requirements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Hardware and Software Faults
 &lt;div id="hardware-and-software-faults" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hardware-and-software-faults" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In large-scale systems, hardware faults happen frequently enough that they become part of normal system operation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;About 2-5% of disk hard drives fail each year; in a storage cluster with 10,000 disks, we can therefore expect on average one disk failure per day.&lt;/li&gt;
&lt;li&gt;About 0.5-1% of solid-state drives (SSDs) fail each year. Uncorrectable errors occur about once per drive per year.&lt;/li&gt;
&lt;li&gt;About one in 1,000 machines has a CPU core that occasionally computes incorrect results.&lt;/li&gt;
&lt;li&gt;Data in RAM can also be corrupted, due to random events like cosmic rays or permanent physical defects. Additionally, certain pathological memory access patterns can flip bits with high probability.&lt;/li&gt;
&lt;li&gt;Other hardware components such as power supplies, RAID controllers, and memory modules also fail.&lt;/li&gt;
&lt;li&gt;An entire data center can become unavailable (e.g., due to power outages or network misconfiguration) or even permanently destroyed (e.g., fire or flood).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Software faults are often unpredictable and, because they are correlated across nodes, can cause more system failures than hardware faults:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A bug that causes all application server instances to crash upon receiving a specific bad input. For example, the leap second on June 30, 2012, caused many applications to hang simultaneously due to a bug in the Linux kernel.&lt;/li&gt;
&lt;li&gt;A runaway process that exhausts some shared resource — CPU time, memory, disk space, or network bandwidth.&lt;/li&gt;
&lt;li&gt;A service that the system depends on becomes slow, unresponsive, or starts returning incorrect responses.&lt;/li&gt;
&lt;li&gt;Cascading failures, where a small fault in one component triggers a fault in another, which triggers further faults.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Operational configuration errors are the leading cause of service outages, while hardware faults (server or network) account for only 10-25% of service outages.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Scalability Principles
 &lt;div id="scalability-principles" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#scalability-principles" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A good general principle for scalability is to decompose the system into small components that can operate relatively independently. This is the basic principle behind microservices. However, the challenge lies in knowing where to draw the line between things that belong together and things that should be separate.&lt;/p&gt;
&lt;p&gt;If a single-machine database can do the job, it may be preferable to a complex distributed setup. A system with five services is simpler than one with fifty services. Good architecture often involves a mix of approaches.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Operations
 &lt;div id="operations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#operations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;An operations team is critical to keeping software systems running smoothly. The typical responsibilities of a good operations team include (and go beyond) the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Monitoring system health and quickly restoring service when it degrades.&lt;/li&gt;
&lt;li&gt;Tracking down the causes of problems, such as system failures or performance degradation.&lt;/li&gt;
&lt;li&gt;Keeping software and platforms up to date, including security patches.&lt;/li&gt;
&lt;li&gt;Understanding interactions between systems to avoid damaging changes before they cause harm.&lt;/li&gt;
&lt;li&gt;Anticipating future problems and addressing them before they occur (e.g., capacity planning).&lt;/li&gt;
&lt;li&gt;Establishing good practices for deployment, configuration, and management, and writing supporting tools.&lt;/li&gt;
&lt;li&gt;Performing complex maintenance tasks, such as migrating applications from one platform to another.&lt;/li&gt;
&lt;li&gt;Maintaining system security during configuration changes.&lt;/li&gt;
&lt;li&gt;Defining workflows to make operations predictable and maintain production environment stability.&lt;/li&gt;
&lt;li&gt;Preserving organizational knowledge about systems as personnel come and go.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Good operability means easier day-to-day work, allowing the operations team to focus on high-value tasks. Data systems can make routine tasks easier in various ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Providing good monitoring with visibility into the system&amp;rsquo;s internal state and runtime behavior.&lt;/li&gt;
&lt;li&gt;Offering good support for automation, integrating the system with standardized tools.&lt;/li&gt;
&lt;li&gt;Avoiding dependence on a single machine (allowing machines to be taken down for maintenance while the overall system continues running uninterrupted).&lt;/li&gt;
&lt;li&gt;Providing good documentation and an easy-to-understand operational model (&amp;ldquo;if you do X, Y will happen&amp;rdquo;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Providing good default behavior but also allowing administrators to freely override defaults when needed&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Self-healing when possible, but also allowing administrators to manually control system state when needed.&lt;/li&gt;
&lt;li&gt;Predictable behavior, minimizing surprises.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some aspects of operations can and should be automated, but setting up correctly functioning automation in the first place still depends on humans.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Systems with too strong an individual stamp cannot succeed. When the initial design is complete and relatively stable, the real testing begins as different people test it in their own ways.
— Donald Knuth&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Ch3: Data Models and Query Languages
 &lt;div id="ch3-data-models-and-query-languages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch3-data-models-and-query-languages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Most applications are built by layering one data model on top of another.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;As an application developer, you observe the real world (with people, organizations, goods, actions, money flows, sensors, etc.) and model it in terms of objects or data structures and APIs that manipulate those data structures. These structures are typically specific to your application.&lt;/li&gt;
&lt;li&gt;When you want to store these data structures, you express them in a general-purpose data model, such as JSON or XML documents, tables in a relational database, or vertices and edges in a graph. These data models are the subject of this chapter.&lt;/li&gt;
&lt;li&gt;The engineers who build your database software decided on a way to represent that JSON/relational/graph data as bytes in memory, on disk, or on the network. This representation may allow the data to be queried, searched, manipulated, and processed in various ways. We&amp;rsquo;ll discuss these storage engine designs in a later chapter.&lt;/li&gt;
&lt;li&gt;At an even lower level, hardware engineers have figured out how to represent bytes in terms of electric currents, light pulses, magnetic fields, and so on.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;SQL &amp;amp; NoSQL
 &lt;div id="sql--nosql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql--nosql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Databases can execute declarative queries in parallel across multiple CPU cores and machines, without you needing to worry about how to implement that parallelism. Implementing such parallel execution yourself in hand-coded algorithms would be an enormous undertaking.&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;relational model&lt;/em&gt;, despite being half a century old, remains an important data model for many applications — especially in data warehousing and business analytics, where relational star or snowflake schemas and SQL queries are ubiquitous. However, in other domains, several alternatives to relational data have become popular:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;em&gt;document model&lt;/em&gt; targets use cases where data comes in the form of self-contained JSON documents and relationships between documents are rare.&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;graph data model&lt;/em&gt; goes in the opposite direction, targeting use cases where anything can be related to everything, and queries may need to traverse multiple hops to find data of interest (this can be expressed using recursive queries in Cypher, SPARQL, or Datalog).&lt;/li&gt;
&lt;li&gt;The &lt;em&gt;dataframe&lt;/em&gt; generalizes relational data into a large number of columns, building a bridge between databases and the multidimensional arrays that form the foundation of most machine learning, statistical data analysis, and scientific computing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Databases also tend to expand into adjacent domains by adding support for other data models: for example, relational databases have added support for document data in the form of JSON columns, document databases have added relational-like joins, and support for graph data in SQL is gradually improving.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch4: Storage and Indexing
 &lt;div id="ch4-storage-and-indexing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch4-storage-and-indexing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Hash Indexes
 &lt;div id="hash-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hash-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Key-value stores are quite similar to the &lt;em&gt;dictionary&lt;/em&gt; type found in most programming languages, which is typically implemented using a &lt;em&gt;hash map&lt;/em&gt; or &lt;em&gt;hash table&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Generally, the hash map of a hash index is kept entirely in memory. Data values can use more space than available memory because the required portion can be loaded from disk with a single disk seek.&lt;/p&gt;
&lt;p&gt;Drawbacks of hash indexes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In principle, a hash map can be maintained on disk. Unfortunately, disk-based hash maps struggle to perform well. They require a large amount of random-access I/O, are expensive to grow when exhausted, and require tedious logic to resolve hash collisions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Range queries are inefficient&lt;/strong&gt;. For example, you can&amp;rsquo;t easily scan all keys between kitty00000 and kitty99999 — you must look up each key individually in the hash map.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;B-Tree Indexes
 &lt;div id="b-tree-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#b-tree-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;B-tree indexes have been around since 1970 and are widely accepted and used in the industry.
This section is familiar to most readers — skipped.&lt;/p&gt;

&lt;h3 class="relative group"&gt;SSTables &amp;amp; LSM Trees
 &lt;div id="sstables--lsm-trees" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sstables--lsm-trees" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In hash indexes, the order of key-value pairs doesn&amp;rsquo;t matter. But we can require that the sequence of key-value pairs be sorted by key. This format is called a Sorted String Table, or SSTable.&lt;/p&gt;
&lt;p&gt;Compared to log segments using hash indexes, SSTables have several major advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Even if the file is larger than available memory, merging segments remains simple and efficient. The approach is like the one used in merge sort algorithms: you start reading multiple input files side by side, look at the first key in each file, copy the lowest key (according to the sort order) to the output file, and repeat. This produces a new merged segment file, also sorted by key.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4af781ce48d4.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;To find a particular key in the file, you no longer need to keep an index of all keys in memory. You still need an in-memory index to tell you the offsets for some of the keys, but it can be &lt;em&gt;sparse&lt;/em&gt;: one key per several kilobytes of segment file is sufficient, because several kilobytes can be scanned very quickly.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6c279ab1a032.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;p&gt;Using these data structures, you can insert keys in any order and read them back in sorted order.
Now we can make our storage engine work as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When a new write comes in, add it to an in-memory balanced tree data structure (e.g., a red-black tree). This in-memory tree is sometimes called a &lt;em&gt;memtable&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;When the memtable becomes larger than some threshold (typically a few megabytes), write it out to disk as an SSTable file. This can be done efficiently because the tree already maintains key-value pairs sorted by key. The new SSTable file becomes the most recent segment of the database. While that SSTable is being written to disk, new writes can continue on a new memtable instance.&lt;/li&gt;
&lt;li&gt;When a read request comes in, first try to find the key in the memtable, then in the most recent on-disk segment, then in the next older segment, and so on.&lt;/li&gt;
&lt;li&gt;From time to time, run a merging and compaction process in the background to combine segment files and discard overwritten or deleted values.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The algorithm described here is essentially the technique used by LevelDB and RocksDB, key-value storage engine libraries designed to be embedded in other applications. Similar storage engines are used in Cassandra and HBase, and &lt;strong&gt;all of them were inspired by Google&amp;rsquo;s Bigtable paper (which introduced the terms SSTable and memtable)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;In-Memory Databases
 &lt;div id="in-memory-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#in-memory-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In-memory databases:
As RAM becomes cheaper, the argument that RAM costs more per GB is eroding. Many datasets are not that large, so keeping them entirely in memory is quite feasible, including potentially distributed across multiple machines. This has led to the development of in-memory databases.
Losing data when restarting a computer may be acceptable. Durability can also be achieved through special hardware (e.g., battery-backed RAM), by writing a change log to disk, by periodically writing snapshots to disk, or by replicating the in-memory state to other machines.&lt;/p&gt;
&lt;p&gt;The typical in-memory database Redis provides weak durability through asynchronous writes to disk. Other in-memory databases include Memcached, VoltDB, MemSQL, Oracle TimesTen, and RAMCloud.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Counterintuitively, the performance advantage of in-memory databases does not come from avoiding disk reads. Instead, they are faster because they avoid the overhead of encoding in-memory data structures into on-disk data structures.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Materialized Views and OLAP
 &lt;div id="materialized-views-and-olap" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#materialized-views-and-olap" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Think of SQL functions like COUNT, SUM, AVG, MIN, or MAX. If the same aggregations are used by many different queries, it may be wasteful to process the raw data each time. Why not cache some of the most frequently used counts or sums? One way to create such a cache is a &lt;em&gt;Materialized View&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;When the underlying data changes, a materialized view needs to be updated because it is a denormalized copy of the data. &lt;strong&gt;The database can do this automatically, but such updates make writes more expensive, which is why materialized views are not commonly used in OLTP databases&lt;/strong&gt;. &lt;strong&gt;In read-heavy data warehouses, they may make more sense because warehouses don&amp;rsquo;t have many small, frequent updates&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The advantage of a materialized data cube is that it can make certain queries extremely fast because they have already been effectively precomputed. For example, if you want to know the total sales per store, you just look at the total along the appropriate dimension without scanning millions of rows of raw data.&lt;/p&gt;
&lt;p&gt;The disadvantage of a data cube is that it lacks the flexibility of querying raw data. For example, there is no way to compute what proportion of sales came from items costing over $100, because price is not one of the dimensions. Therefore, most data warehouses try to keep as much raw data as possible and use aggregate data (like data cubes) only as a performance boost for certain queries.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Column-Oriented Storage
 &lt;div id="column-oriented-storage" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#column-oriented-storage" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The idea behind column-oriented storage is simple: instead of storing all the values from one row together, store all the values from each column together. Column-oriented storage is easiest to understand in the relational data model, but it applies equally to non-relational data. For example, Parquet is a column-oriented storage format that supports a document data model based on Google&amp;rsquo;s Dremel.&lt;/p&gt;
&lt;p&gt;These optimizations (column compression, sorting, etc.) make sense in data warehouses, where the workload consists mainly of large read-only queries run by analysts. Column-oriented storage, compression, and sorting all help read those queries faster. However, their drawback is that writes become more difficult.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch5: Encoding and Evolution
 &lt;div id="ch5-encoding-and-evolution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch5-encoding-and-evolution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;REST vs. RPC
 &lt;div id="rest-vs-rpc" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rest-vs-rpc" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Servers expose APIs over the network, and clients can connect to servers to make requests to those APIs. The API exposed by a server is called a &lt;em&gt;service&lt;/em&gt;. Download data via GET requests, submit data to the server via POST requests.&lt;/p&gt;
&lt;p&gt;When a service uses HTTP as the underlying communication protocol, it can be called a &lt;em&gt;web service&lt;/em&gt;. There are two popular approaches to web services: REST and SOAP. REST is not a protocol but a design philosophy based on HTTP principles. APIs designed according to REST principles are called RESTful.&lt;/p&gt;
&lt;p&gt;Remote Procedure Calls (RPC) are very different from local function calls:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local function calls are predictable and succeed or fail based only on parameters under your control. Network requests are unpredictable: requests or responses may be lost due to network problems, or the remote machine may be slow or unavailable.&lt;/li&gt;
&lt;li&gt;A local function call either returns a result, throws an exception, or never returns (because it enters an infinite loop or the process crashes). A network request has another possible outcome: it may return with no result due to a timeout.&lt;/li&gt;
&lt;li&gt;And so on.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;REST seems to be the dominant style for public APIs, while RPC frameworks mainly focus on requests between services owned by the same organization, typically within the same data center.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch6: Replication
 &lt;div id="ch6-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch6-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Replication logs, failover, single-leader mode — the content is relatively straightforward. Skipped.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Multi-Leader Replication
 &lt;div id="multi-leader-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multi-leader-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Multi-leader replication is often a retrofitted feature in many databases, so it frequently has subtle configuration pitfalls and often interacts unexpectedly with other database features. For example, auto-increment primary keys, triggers, and integrity constraints can all cause trouble. Therefore, &lt;strong&gt;multi-leader replication is often considered dangerous territory and should be avoided whenever possible&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;However, multi-leader replication does have certain advantages, such as distributing write I/O, disaster recovery, and reducing network overhead in multi-region deployments (local writes), etc.&lt;/p&gt;
&lt;p&gt;Write conflicts:
The biggest problem with multi-leader replication is the potential for write conflicts, and resolving them is quite tricky.
In principle, conflict detection could be made synchronous — i.e., wait for writes to be replicated to all replicas before telling the user the write succeeded. But this may defeat the purpose of multi-leader: &lt;strong&gt;if you want synchronous conflict detection, you might as well just use single-leader replication&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Resolving multi-leader write conflicts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Avoid conflicts. For example, have the application control that users only edit their own data.&lt;/li&gt;
&lt;li&gt;Converge to consistency:
&lt;ul&gt;
&lt;li&gt;Last Write Wins (LWW). Write by timestamp — may result in data loss.&lt;/li&gt;
&lt;li&gt;Priority writes. Higher-priority writes win — may result in data loss.&lt;/li&gt;
&lt;li&gt;Extra code. Preserve conflict information and write custom conflict resolution code.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Real-time collaborative editing applications allow multiple people to edit a document simultaneously — Etherpad and Google Docs are mature examples. &lt;strong&gt;Databases are still very young in the area of multi-leader writes&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Multi-leader write conflicts in databases are mostly resolved or avoided at the application level. The following are relatively mature areas of write conflict research for reference:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Conflict-free Replicated Data Types (CRDTs)&lt;/strong&gt; are data structures such as sets, maps, ordered lists, and counters that can be concurrently edited by multiple users and resolve conflicts automatically in a reasonable way. Some CRDTs have been implemented in Riak 2.0.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mergeable Persistent Data Structures&lt;/strong&gt; explicitly track history, similar to the Git version control system, and use three-way merge functions (whereas CRDTs use two-way merges).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Operational Transformation (OT)&lt;/strong&gt; is the conflict resolution algorithm behind collaborative editing applications like Etherpad and Google Docs. It is designed specifically for concurrent editing of ordered lists, such as lists of characters that make up a text document.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Ch7: Partitioning
 &lt;div id="ch7-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch7-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Range Partitioning and Hash Partitioning
 &lt;div id="range-partitioning-and-hash-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#range-partitioning-and-hash-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The drawback of range partitioning is that certain access patterns can lead to hot spots. If the primary key is a timestamp, partitions correspond to time ranges, and all writes will go to the same partition (i.e., today&amp;rsquo;s partition), which may become overloaded with writes while other partitions sit idle.
You can use something other than the timestamp as the first part of the primary key to scatter the hot spot, but the drawback is that range queries won&amp;rsquo;t benefit.&lt;/p&gt;
&lt;p&gt;Hash partitioning can mitigate the risk of skew and hot spots. For the purpose of partitioning, the hash function doesn&amp;rsquo;t need to be a cryptographically strong algorithm.
The drawback of hash partitioning is that by partitioning by key hash, we lose a great property of key-range partitioning: the ability to efficiently execute range queries.&lt;/p&gt;
&lt;p&gt;Hash partitioning can help reduce hot spots. But it cannot eliminate them entirely. For example, on a social media site, a celebrity user with millions of followers doing something can trigger a storm. This event can cause a large number of writes to the same key (the key might be the celebrity&amp;rsquo;s user ID or the ID of the action being commented on). Hash strategies don&amp;rsquo;t help here, because the hash of two identical IDs is still the same.
&lt;strong&gt;If a primary key is very hot, a simple workaround is to add a random number at the beginning or end of the primary key&lt;/strong&gt;. Just a two-digit decimal random number can scatter the primary key into 100 different primary keys, thus stored in different partitions. In any case, it&amp;rsquo;s about scattering hot spots, and you need to consider side effects such as the impact on range queries.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch8: Transactions
 &lt;div id="ch8-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch8-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;ACID, BASE
 &lt;div id="acid-base" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#acid-base" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ACID is actually a very old definition. Due to the later discovery of many &amp;ldquo;anomalies,&amp;rdquo; a system claiming to guarantee ACID can&amp;rsquo;t actually articulate what exactly it guarantees.
Whatever the case, ACID remains deeply ingrained — it represents the most fundamental principles of transactions. Conversely, systems that don&amp;rsquo;t meet the ACID criteria are sometimes called BASE, which stands for Basically Available, Soft State, and Eventual Consistency. BASE is a concept commonly mentioned in the NoSQL world.&lt;/p&gt;
&lt;p&gt;The definition of BASE is even fuzzier than ACID. A simple, easy-to-understand, easy-to-remember theory of BASE: BASE (which means &amp;ldquo;alkali&amp;rdquo; in chemistry) is the opposite of ACID (which means &amp;ldquo;acid&amp;rdquo;).&lt;/p&gt;
&lt;p&gt;You can think of it simply this way:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Relational databases&lt;/td&gt;
 &lt;td&gt;Non-relational databases&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Transactions&lt;/td&gt;
 &lt;td&gt;No transactions&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ACID&lt;/td&gt;
 &lt;td&gt;BASE&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;SQL&lt;/td&gt;
 &lt;td&gt;NoSQL&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Atomicity and isolation within ACID are relatively easy to understand.
The concept of consistency is actually quite vague and doesn&amp;rsquo;t seem closely related to the database itself. A quote in the book is very classic:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Joe Hellerstein pointed out that in Härder and Reuter&amp;rsquo;s paper, &amp;ldquo;the C in ACID&amp;rdquo; was &amp;ldquo;tossed in to make the acronym work,&amp;rdquo; and at the time, nobody cared much about consistency.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;And the definition of isolation is very fuzzy. The industrial practice of serializability has also been stagnant.
Transaction isolation can be described as &amp;ldquo;a mess,&amp;rdquo; but if serializability is a panacea, why does no one use it?
Refer to this article: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/131333588" target="_blank" rel="noreferrer"&gt;The History of Transactions and SSI&lt;/a&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Anomalies in non-serializable isolation levels generally only manifest under high concurrency; databases with low concurrency rarely encounter problems.&lt;/li&gt;
&lt;li&gt;When anomalies do occur, some applications may not notice them or may detect them but find them unimportant.&lt;/li&gt;
&lt;li&gt;Data may be anomalous, but the application may simply return an error and enter an anomaly-handling routine.&lt;/li&gt;
&lt;li&gt;Cost is too high. Not only is the development cost of serializable isolation levels high for databases, but applications also need adaptation costs for serializability. Just understanding this complex theory is no easy task.&lt;/li&gt;
&lt;li&gt;Higher isolation levels lose some performance. Massive rework may be thankless; applications need to choose between &amp;ldquo;high concurrency&amp;rdquo; and &amp;ldquo;no anomalies.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Businesses develop based on mechanisms, not rules. Businesses have somewhat adapted to the anomalies of weaker isolation levels, especially Read Committed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Summed up in one sentence: It&amp;rsquo;s not like it&amp;rsquo;s unusable!&lt;/p&gt;

&lt;h3 class="relative group"&gt;Pessimistic and Optimistic Transaction Models
 &lt;div id="pessimistic-and-optimistic-transaction-models" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pessimistic-and-optimistic-transaction-models" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Two-phase locking is a so-called &lt;em&gt;pessimistic&lt;/em&gt; concurrency control mechanism: it is based on the principle that if something might go wrong (e.g., another transaction holding a lock), it&amp;rsquo;s better to wait until the situation is safe before proceeding. It&amp;rsquo;s like a mutex used to protect data structures in multi-threaded programming.&lt;/p&gt;
&lt;p&gt;In a sense, serial execution could be called the ultimate in pessimism: for the duration of each transaction, each transaction holds an exclusive lock on the entire database (or a partition of the database). As compensation for the pessimism, we make each transaction execute very fast, so the &amp;ldquo;lock&amp;rdquo; is only held for a short time.&lt;/p&gt;
&lt;p&gt;In contrast, &lt;strong&gt;Serializable Snapshot Isolation is an &lt;em&gt;optimistic&lt;/em&gt; concurrency control technique&lt;/strong&gt;. In this context, optimistic means that if there is potential danger, the transaction is not blocked — instead, it continues executing, hoping everything will turn out fine. When a transaction wants to commit, the database checks whether anything bad happened (i.e., whether isolation was violated); if so, the transaction is aborted and must be retried. Only serializable transactions are allowed to commit. &lt;strong&gt;If there is a lot of contention (i.e., many transactions trying to access the same objects), performance suffers because a large proportion of transactions need to be aborted&lt;/strong&gt;. If the system is already near maximum throughput, the additional load from retried transactions can worsen performance.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch9: Distributed Systems
 &lt;div id="ch9-distributed-systems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch9-distributed-systems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Clocks
 &lt;div id="clocks" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clocks" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Clocks are critically important in distributed systems — they can directly affect the visibility, isolation, and correctness of transactions.
In reality, reading a precise point in time is meaningless (from a quantum theory perspective, there is no concept of an absolute point in time; the actual situation is even more complex). Spanner&amp;rsquo;s Google TrueTime API reports a confidence interval for the local clock. The confidence interval reports an extremely short and trustworthy time &lt;em&gt;range&lt;/em&gt; rather than a time point.
For example, if you have two confidence intervals, each containing the earliest and latest possible timestamps ($A = [A_{earliest}, A_{latest}]$, $B=[B_{earliest}, B_{latest}]$), and these two intervals do not overlap (i.e., $A_{earliest} &amp;lt; A_{latest} &amp;lt; B_{earliest} &amp;lt; B_{latest}$), then B definitely happened after A — there is no doubt. Only when the intervals overlap are we uncertain about the order in which A and B occurred.
To ensure that transaction timestamps reflect causality, Spanner deliberately waits for the length of the confidence interval before committing a read-write transaction. To keep the clock uncertainty as small as possible, Google deploys a GPS receiver or atomic clock in every data center, allowing clocks to be synchronized to within about 7 milliseconds.
&lt;strong&gt;Logical clocks&lt;/strong&gt; are based on incrementing counters rather than oscillating quartz crystals. Logical clocks only measure the relative ordering of events.&lt;/p&gt;
&lt;p&gt;Real time may not exist. Responsiveness trumps everything. For most server-side data processing systems, real-time guarantees are uneconomical or unsuitable. Therefore, these systems must endure pauses and clock instability in non-real-time environments.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch10: Consistency and Consensus
 &lt;div id="ch10-consistency-and-consensus" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch10-consistency-and-consensus" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;All the problems we&amp;rsquo;ve assumed are possible: packets in the network can be lost, reordered, duplicated, or arbitrarily delayed; clocks are at best approximate; and nodes can pause (e.g., due to garbage collection) or crash at any time.&lt;/p&gt;

&lt;h3 class="relative group"&gt;CAP
 &lt;div id="cap" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cap" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The formal definition of the CAP theorem is limited to a very narrow scope — it only considers one consistency model (linearizability) and one type of fault (network partitions, or nodes that are alive but disconnected from each other). It doesn&amp;rsquo;t discuss anything about network delays, dead nodes, or other trade-offs. Therefore, despite CAP&amp;rsquo;s historical influence, it has no practical value for designing systems.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Distributed Transactions and Consensus
 &lt;div id="distributed-transactions-and-consensus" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#distributed-transactions-and-consensus" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;All the consensus protocols discussed so far internally use a leader in some form, but they don&amp;rsquo;t guarantee that the leader is unique. Instead, they make a weaker guarantee: the protocol defines an &lt;em&gt;epoch number&lt;/em&gt; (called ballot number in Paxos, view number in Viewstamped Replication, and term number in Raft) and ensures that within each epoch, the leader is unique.
Whenever the current leader is thought to be dead, a vote begins among the nodes to elect a new leader. This election is assigned an incrementing epoch number, so epoch numbers are totally ordered and monotonically increasing. If there is a conflict between leaders from two different epochs (perhaps because the previous leader hadn&amp;rsquo;t actually died), the leader with the higher epoch number prevails.
Designing algorithms that robustly cope with unreliable networks remains an open research problem.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch11: Batch Processing
 &lt;div id="ch11-batch-processing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch11-batch-processing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Services (online systems)&lt;/strong&gt;
Services wait for requests or instructions from clients to arrive. Upon receiving one, the service attempts to process it as quickly as possible and sends back a response. Response time is typically the primary performance metric for services, and availability is usually very important (if clients can&amp;rsquo;t reach the service, users may receive error messages).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Batch processing systems (offline systems)&lt;/strong&gt;
A batch processing system takes a large amount of input data, runs a &lt;em&gt;job&lt;/em&gt; to process it, and produces some output data. This often takes a while (from minutes to days), so typically no user is waiting for the job to finish. Instead, batch jobs typically run periodically (e.g., once a day). The primary performance metric for batch jobs is typically &lt;em&gt;throughput&lt;/em&gt; (the time needed to process input of a certain size). This chapter discusses batch processing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Stream processing systems (near-real-time systems)&lt;/strong&gt;
Stream processing sits between online and offline (batch) processing, so it is sometimes called &lt;em&gt;near-real-time&lt;/em&gt; or &lt;em&gt;nearline&lt;/em&gt; processing. Like batch processing systems, stream processing consumes inputs and produces outputs (without needing to respond to requests). However, stream jobs operate on events shortly after they occur, whereas batch jobs wait for a fixed set of input data. This difference gives stream processing systems lower latency compared to batch processing systems.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The batch processing algorithm MapReduce, published in 2004, was (perhaps over-enthusiastically) called &amp;ldquo;the algorithm that made Google&amp;rsquo;s massive scalability possible.&amp;rdquo; MapReduce is a fairly low-level programming model.&lt;/p&gt;

&lt;h3 class="relative group"&gt;MapReduce and Distributed File Systems
 &lt;div id="mapreduce-and-distributed-file-systems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mapreduce-and-distributed-file-systems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Compared to the query optimizer of a relational database, Unix tools, despite their simplicity, are still remarkably useful.
The biggest limitation of Unix tools is that they can only run on a single machine — this is where tools like Hadoop came in.
MapReduce is somewhat like Unix tools but distributed across thousands of machines. Like Unix tools, it&amp;rsquo;s fairly crude but surprisingly effective.
MapReduce jobs read and write files on a distributed file system. In Hadoop&amp;rsquo;s implementation of MapReduce, this file system is called HDFS (Hadoop Distributed File System), an open-source implementation of the Google File System (GFS).
Besides HDFS, there are various other distributed file systems such as GlusterFS and the Quantcast File System (QFS). Object storage services like Amazon S3, Azure Blob Storage, and OpenStack Swift are similar in many ways.&lt;/p&gt;
&lt;p&gt;To create a MapReduce job, you need to implement two callback functions, Mapper and Reducer, which behave as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mapper&lt;/strong&gt;
The Mapper is called once on each input record. Its job is to extract key-value pairs from the input record. For each input, it can generate any number of key-value pairs (including none). It does not retain any state from one input record to the next, so each record is processed independently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Reducer&lt;/strong&gt;
The MapReduce framework takes the key-value pairs produced by the Mapper, collects all values belonging to the same key, and iteratively calls the Reducer over this set of values. The Reducer can produce output records (e.g., the count of occurrences of the same URL).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using the MapReduce programming model, the physical network communication aspects of computation (getting data from the right machines) are separated from the application logic (processing the data after obtaining it). This separation contrasts sharply with the typical use of databases, where requests to fetch data from the database frequently appear within application code. Because MapReduce handles all network communication, it also frees application code from worrying about partial failures, such as the crash of another node: MapReduce can transparently retry failed tasks without affecting application logic.&lt;/p&gt;
&lt;p&gt;Another common pattern of &amp;ldquo;putting related data together&amp;rdquo; is grouping records by some key (like the GROUP BY clause in SQL). The simplest way to implement this grouping operation with MapReduce is to set up the Mapper so that the key-value pairs it generates use the desired grouping key. The partitioning and sorting process then directs all records with the same partition key to the same Reducer.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Hadoop vs. Distributed Databases
 &lt;div id="hadoop-vs-distributed-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hadoop-vs-distributed-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;As we&amp;rsquo;ve seen, Hadoop is somewhat like a distributed version of Unix, where HDFS is the file system and MapReduce is a peculiar implementation of Unix processes (always running the &lt;code&gt;sort&lt;/code&gt; utility between the Map and Reduce phases). We&amp;rsquo;ve seen how various join and grouping operations can be implemented on top of these primitives.&lt;/p&gt;
&lt;p&gt;When the MapReduce paper was published, it was — in a sense — not new. All the processing and parallel join algorithms we discussed in earlier sections had already been implemented over a decade earlier in so-called &lt;em&gt;massively parallel processing&lt;/em&gt; (MPP) databases. Examples include the Gamma database machine, Teradata, and Tandem NonStop SQL, which were pioneers in this area.&lt;/p&gt;
&lt;p&gt;The biggest difference is that MPP databases focus on executing analytical SQL queries in parallel across a set of machines, whereas the combination of MapReduce and a distributed file system is more like a general-purpose operating system that can run arbitrary programs.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Diversity of Processing Models
 &lt;div id="diversity-of-processing-models" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#diversity-of-processing-models" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Having only two processing models, SQL and MapReduce, is not enough — more diverse models are needed! And due to the openness of the Hadoop platform, implementing a whole range of approaches is feasible, something that was impossible within the monolithic MPP database paradigm.
Traditionally, MPP databases met the needs of business intelligence analytics and business reporting, but this is only one of many domains that use batch processing.
In the years since MapReduce became popular, execution engines for distributed batch processing have matured significantly.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch12: Stream Processing
 &lt;div id="ch12-stream-processing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch12-stream-processing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Skipped.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Event Sourcing
 &lt;div id="event-sourcing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#event-sourcing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Event sourcing is a powerful data modeling technique: from the application&amp;rsquo;s perspective, it&amp;rsquo;s more meaningful to record user actions as immutable events rather than recording the effects of those actions in a mutable database. Event sourcing is similar to the &lt;em&gt;chronicle&lt;/em&gt; data model.
Like change data capture, event sourcing involves storing all changes to application state as a log of change events.
Applications using event sourcing need to pull the event log (representing the data written to the system) and transform it into application state suitable for display to users. The current state is derived from the event log.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Ch13: The Future of Data Systems
 &lt;div id="ch13-the-future-of-data-systems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ch13-the-future-of-data-systems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Lambda Architecture
 &lt;div id="lambda-architecture" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lambda-architecture" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If batch processing is used to reprocess historical data and stream processing is used for recent updates, how do we combine the two? The Lambda Architecture is one proposal for this.
The core idea of the Lambda Architecture is to record incoming data by appending immutable events to an ever-growing dataset, similar to event sourcing.
In the Lambda approach, the stream processor consumes events and quickly produces an approximate update to the view; the batch processor later uses the same set of events and produces a corrected version of the derived view.&lt;/p&gt;
&lt;p&gt;Unix evolved pipelines and files that are just byte sequences, while databases evolved SQL and transactions.
Which approach is better? Of course, it depends on what you want. Unix is &amp;ldquo;simple&amp;rdquo; because it&amp;rsquo;s a fairly thin wrapper around hardware resources; relational databases are &amp;ldquo;simpler&amp;rdquo; because a short declarative query can leverage a lot of powerful infrastructure (query optimization, indexes, join methods, concurrency control, replication, etc.) without requiring the query author to understand the implementation details.
I interpret the NoSQL movement as a desire to apply Unix-like low-level abstractions to the domain of distributed OLTP data storage.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Separation of Application Code and State
 &lt;div id="separation-of-application-code-and-state" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#separation-of-application-code-and-state" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In theory, a database could be a deployment environment for arbitrary application code, much like an operating system. In practice, however, they are poorly suited to this goal. They don&amp;rsquo;t meet the requirements of modern application development, such as dependency and package management, version control, rolling upgrades, evolvability, monitoring, metrics, calls to network services, and integration with external systems.
I believe it makes sense to have some parts of the system specialized for persistent data storage and other parts specialized for running application code. The two can interact while remaining independent.
The trend is to separate stateless application logic from state management (databases): don&amp;rsquo;t put application logic into the database, and don&amp;rsquo;t put persistent state into the application.&lt;/p&gt;
&lt;p&gt;I assert that in most applications, integrity is far more important than timeliness. Violating timeliness may be confusing and annoying, but violating integrity can be catastrophic.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Problems Introduced by Algorithms
 &lt;div id="problems-introduced-by-algorithms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problems-introduced-by-algorithms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Bias and discrimination&lt;/strong&gt;: For example, in racially segregated areas, a person&amp;rsquo;s ZIP code, or even their IP address, is a strong indicator of race. Given this, it seems absurd to believe that an algorithm can somehow take biased data as input and produce fair and unbiased output. Yet this view often seems to lurk among advocates of data-driven decision-making — an attitude satirized as &amp;ldquo;machine learning is like money laundering for bias.&amp;rdquo; Predictive analytics systems simply extrapolate from the past; if the past was discriminatory, they codify that discrimination into rules.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Responsibility and accountability&lt;/strong&gt;: Automated decision-making raises questions about responsibility and accountability. If a person makes a mistake, they can be held accountable, and those affected by the decision can appeal. Algorithms also make mistakes, but when they do, who is responsible?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Privacy and surveillance&lt;/strong&gt;: Let&amp;rsquo;s do a thought experiment. Try replacing the word &lt;strong&gt;data&lt;/strong&gt; with &lt;strong&gt;surveillance&lt;/strong&gt; and see if common phrases still sound as nice. For example: &amp;ldquo;In our surveillance-driven organization, we collect real-time surveillance streams and store them in our surveillance warehouse. Our surveillance scientists use advanced analytics and surveillance processing to gain new insights.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Blind faith in the supremacy of data-driven decisions is not just delusional — it&amp;rsquo;s genuinely dangerous. As data-driven decision-making becomes more prevalent, we need to figure out how to make algorithms more accountable and transparent, how to avoid reinforcing existing biases, and how to fix them when they inevitably err.&lt;/p&gt;
&lt;p&gt;Users barely know what data they&amp;rsquo;re giving us, what data goes into the database, and how the data is retained and processed — most privacy policies are ambiguous, stringing users along without coming clean. If users don&amp;rsquo;t understand what will happen to their data, they can&amp;rsquo;t give any meaningful consent.
For users who disagree with surveillance, the only truly viable alternative is simply not to use the service. But this choice isn&amp;rsquo;t truly free either: if a service is so popular that it is &amp;ldquo;considered a necessity for basic social participation by most,&amp;rdquo; then expecting people to opt out is unreasonable — using it is effectively mandatory.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;Since software and data have such an enormous impact on the world, we engineers must remember that we have a responsibility to work toward the kind of world we want: a world that respects people, that respects humanity. I hope we can work together toward that goal.&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>PostgreSQL CLOG Files and Standby Synchronization Analysis</title><link>https://lastdba.com/en/2024/09/03/postgresql-clog-files-and-standby-synchronization-analysis/</link><pubDate>Tue, 03 Sep 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/09/03/postgresql-clog-files-and-standby-synchronization-analysis/</guid><description>&lt;p&gt;Among all relational databases, PostgreSQL&amp;rsquo;s CLOG is a very special type of log. CLOG&amp;rsquo;s existence is inseparable from PostgreSQL&amp;rsquo;s MVCC mechanism. Some basic knowledge about transaction IDs and CLOG won&amp;rsquo;t be covered in this article. If interested, please refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782857?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522172343394916800211586382%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&amp;amp;request_id=172343394916800211586382&amp;amp;biz_id=0&amp;amp;utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-130782857-null-null.nonecase&amp;amp;utm_term=clog&amp;amp;spm=1018.2226.3001.4450" target="_blank" rel="noreferrer"&gt;CLOG and Hint Bits&lt;/a&gt;. This article focuses on the structure of CLOG files, manually locating transaction states, and the CLOG WAL log synchronization mechanism, to further understand PostgreSQL&amp;rsquo;s CLOG.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CLOG Segment
 &lt;div id="clog-segment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;CLOG Directory
 &lt;div id="clog-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To distinguish from regular logs, PostgreSQL 10 renamed the CLOG and WAL directories &lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Among all relational databases, PostgreSQL&amp;rsquo;s CLOG is a very special type of log. CLOG&amp;rsquo;s existence is inseparable from PostgreSQL&amp;rsquo;s MVCC mechanism. Some basic knowledge about transaction IDs and CLOG won&amp;rsquo;t be covered in this article. If interested, please refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782857?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522172343394916800211586382%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&amp;amp;request_id=172343394916800211586382&amp;amp;biz_id=0&amp;amp;utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-130782857-null-null.nonecase&amp;amp;utm_term=clog&amp;amp;spm=1018.2226.3001.4450" target="_blank" rel="noreferrer"&gt;CLOG and Hint Bits&lt;/a&gt;. This article focuses on the structure of CLOG files, manually locating transaction states, and the CLOG WAL log synchronization mechanism, to further understand PostgreSQL&amp;rsquo;s CLOG.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CLOG Segment
 &lt;div id="clog-segment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;CLOG Directory
 &lt;div id="clog-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To distinguish from regular logs, PostgreSQL 10 renamed the CLOG and WAL directories &lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;pg9.6&lt;/th&gt;
 &lt;th&gt;pg10&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_clog&lt;/td&gt;
 &lt;td&gt;pg_xact&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_xlog&lt;/td&gt;
 &lt;td&gt;pg_wal&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Don&amp;rsquo;t get confused — I was also troubled by pg_xlog and pg_xact for a while&amp;hellip;&lt;/p&gt;

&lt;h3 class="relative group"&gt;CLOG Segment Name
 &lt;div id="clog-segment-name" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segment-name" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CLOG is also managed by SLRU, and CLOG file naming is also in &lt;code&gt;slru.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SlruFileName(ctl, path, seg) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	snprintf(path, MAXPGPATH, &amp;#34;%s/%04X&amp;#34;, (ctl)-&amp;gt;Dir, seg)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;%04X&lt;/code&gt; means hexadecimal (&lt;code&gt;X&lt;/code&gt;), width of 4, zero-padded on the left (&lt;code&gt;04&lt;/code&gt;).
Example CLOG filenames:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg_xact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; 16:29 03C0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 23:04 03C1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;TransactionID and CLOG Location Conversion
 &lt;div id="transactionid-and-clog-location-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transactionid-and-clog-location-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;CLOG only stores transaction ID status, not the transaction ID itself. Through the TransactionID itself, you can directly locate the CLOG file and the position within the file. Before that, we need to understand some fundamentals.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction States Stored in CLOG
 &lt;div id="transaction-states-stored-in-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-states-stored-in-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are only 4 transaction states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; XidStatus;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_IN_PROGRESS		0x00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_COMMITTED		0x01
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_ABORTED		0x02
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_SUB_COMMITTED	0x03&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Transaction states are only: in progress, committed, aborted, subtransaction committed. Note that transaction IDs don&amp;rsquo;t have an &amp;ldquo;not started&amp;rdquo; state — as soon as a transaction ID is allocated in the database, that transaction has definitely already started.
Conversely, transaction IDs not yet allocated in the database (actually a few — see the extend CLOG section below) correspond to &lt;code&gt;in_progress&lt;/code&gt; status in CLOG.
Four transaction states actually only need 2 bits to store. So 1 byte (8 bits) can store 4 transaction states, and 1 page (8k) can hold 8KB*4=32768 transaction states. These are all defined in the source code:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; Defines &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; CLOG page sizes. A page is the same BLCKSZ as is used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; everywhere &lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; in Postgres.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// CLOG page size = BLCKSZ = 8k (default)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_BITS_PER_XACT	2 							 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// One transaction state occupies 2 bits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_BYTE 4 							 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 1 byte can hold 4 transaction states
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 1 page can hold 32768 transaction states, 8KB*4=32768
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACT_BITMASK ((1 &amp;lt;&amp;lt; CLOG_BITS_PER_XACT) - 1) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// Transaction status bitmask = ((1&amp;lt;&amp;lt;2)-1) = 3, expressed in binary as 11
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SLRU_PAGES_PER_SEGMENT	32 &lt;/span&gt;&lt;span style="color:#75715e"&gt;// 1 segment has 32 pages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;1 CLOG segment has 32 pages&lt;/li&gt;
&lt;li&gt;1 CLOG page is 8k (typically)&lt;/li&gt;
&lt;li&gt;1 byte has 4 transaction states&lt;/li&gt;
&lt;li&gt;1 transaction state occupies 2 bits&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;CLOG Segment/Page/Byte Conversion
 &lt;div id="clog-segmentpagebyte-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-segmentpagebyte-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Finding which CLOG segment a transaction ID corresponds to is not easy — it&amp;rsquo;s hidden in the comments:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; Note: because TransactionIds are &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; bits and wrap around at &lt;span style="color:#ae81ff"&gt;0xFFFFFFFF&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; CLOG page numbering also wraps around at &lt;span style="color:#ae81ff"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;CLOG_XACTS_PER_PAGE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; and CLOG segment numbering at
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xFFFFFFFF&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;CLOG_XACTS_PER_PAGE&lt;span style="color:#f92672"&gt;/&lt;/span&gt;SLRU_PAGES_PER_SEGMENT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// segment number = xid/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT = xid/32768/32 // Which CLOG segment the transaction ID corresponds to, xid/32768/32, needs to be converted to hex
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Mapping transaction ID to page, byte, etc. is clearer &lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToPage(xid)	((xid) / (TransactionId) CLOG_XACTS_PER_PAGE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// Which CLOG page the transaction ID corresponds to, xid/32768
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToPgIndex(xid) ((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// The offset within the above page, xid%32768
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToByte(xid)	(TransactionIdToPgIndex(xid) / CLOG_XACTS_PER_BYTE) &lt;/span&gt;&lt;span style="color:#75715e"&gt;// Which byte in the page the transaction ID corresponds to, (xid%32768)/4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToBIndex(xid)	((xid) % (TransactionId) CLOG_XACTS_PER_BYTE)		&lt;/span&gt;&lt;span style="color:#75715e"&gt;// Which bit index in the above byte (note: bit index, not the bit itself), xid%4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generally (with 8k BLCKSZ), 1 CLOG segment has 32 pages; 1 CLOG segment has 32&lt;em&gt;8k bytes, &lt;strong&gt;i.e., CLOG file size is fixed at 256K&lt;/strong&gt;; 1 CLOG segment can hold 4&lt;/em&gt;32*8k transaction states.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg_xact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll &lt;span style="color:#75715e"&gt;# 256k CLOG segment&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; 16:29 03C0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;262144&lt;/span&gt; Aug &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 23:04 03C1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;CLOG Bit Conversion
 &lt;div id="clog-bit-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-bit-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The functions for setting CLOG bits and getting CLOG bits (corresponding to &lt;code&gt;TransactionIdSetStatusBit&lt;/code&gt; and &lt;code&gt;TransactionIdGetStatus&lt;/code&gt;) both have the following code to obtain which two bits in the CLOG the transaction ID corresponds to:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			bshift &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdToBIndex&lt;/span&gt;(xid) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; CLOG_BITS_PER_XACT;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;byteptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	byteptr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XactCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;shared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;page_buffer[slotno] &lt;span style="color:#f92672"&gt;+&lt;/span&gt; byteno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	curval &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#f92672"&gt;*&lt;/span&gt;byteptr &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; bshift) &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; CLOG_XACT_BITMASK;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;bshift&lt;/code&gt; represents the right-shift position, where &lt;code&gt;TransactionIdToBIndex=xid%4&lt;/code&gt;, &lt;code&gt;CLOG_BITS_PER_XACT=2&lt;/code&gt;, &lt;code&gt;CLOG_XACT_BITMASK=3 (binary: 11)&lt;/code&gt;.
The key code for getting CLOG bits &lt;code&gt;curval = (*byteptr &amp;gt;&amp;gt; bshift) &amp;amp; CLOG_XACT_BITMASK&lt;/code&gt; can be understood in two parts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;*byteptr &amp;gt;&amp;gt; bshift&lt;/code&gt; means right-shifting the pointer by 0, 2, 4, or 6 bits&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;amp; CLOG_XACT_BITMASK&lt;/code&gt; is simply taking the last two bits after the right shift (00&amp;amp;11=00, 01&amp;amp;11=01, 10&amp;amp;11=10, 11&amp;amp;11=11)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, calculating the position of a transaction ID&amp;rsquo;s state within a byte:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;xid%4=0: takes bits 7 and 8&lt;/li&gt;
&lt;li&gt;xid%4=1: takes bits 5 and 6&lt;/li&gt;
&lt;li&gt;xid%4=2: takes bits 3 and 4&lt;/li&gt;
&lt;li&gt;xid%4=3: takes bits 1 and 2&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: the transaction ID state&amp;rsquo;s bit positions within a byte are taken in reverse order, not sequentially forward. Byte and page positions are taken in sequential increasing order.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Manually Calculating Transaction ID Position in CLOG File
 &lt;div id="manually-calculating-transaction-id-position-in-clog-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#manually-calculating-transaction-id-position-in-clog-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If we want to manually locate a transaction in CLOG using &lt;code&gt;hexdump&lt;/code&gt;, we need to calculate three elements: &lt;strong&gt;&amp;lt;CLOG segment number, offset within segment in bytes, offset on byte in bit index&amp;gt;&lt;/strong&gt;. (This references the approach in &amp;ldquo;PostgreSQL Database Kernel Analysis&amp;rdquo; but with some differences &lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;)&lt;/p&gt;
&lt;p&gt;Before calculating, you also need to understand:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CLOG segment file numbers are in hexadecimal&lt;/li&gt;
&lt;li&gt;hexdump is in hexadecimal, each line holds 16 bytes, i.e., each line holds &lt;code&gt;16*CLOG_XACTS_PER_BYTE=16*4=64&lt;/code&gt; transaction states&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hexdump -s xxx&lt;/code&gt; is in byte units&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following SQL can calculate the position of a transaction ID in CLOG:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- CLOG segment number
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- %4294967296 represents transaction ID wraparound, /(8192*4*32) represents the maximum number of transactions a segment file can contain, to_hex converts to hex for filename, lpad left-pads to 4 digits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lpad(&lt;span style="color:#66d9ef"&gt;upper&lt;/span&gt;(to_hex(txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;))),&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; clog_segmentno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Offset within segment in bytes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- %4294967296 represents transaction ID wraparound, %(8192*32*4) takes the remaining transaction IDs, /4 converts to byte units
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_clog_offset_bytes;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Offset on byte in bit index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- %4294967296 represents transaction ID wraparound, %4 takes the bit index within the byte
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_byte_offset_bitindex;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Or a single SQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lpad(&lt;span style="color:#66d9ef"&gt;upper&lt;/span&gt;(to_hex(txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;))),&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; clog_segmentno,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_clog_offset_bytes,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_byte_offset_bitindex;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Practical simulation — computing a transaction ID&amp;rsquo;s state in CLOG:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lpad(&lt;span style="color:#66d9ef"&gt;upper&lt;/span&gt;(to_hex(txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;))),&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; clog_segmentno,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_clog_offset_bytes,txid_current()&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4294967296&lt;/span&gt;&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; in_byte_offset_bitindex;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; clog_segmentno &lt;span style="color:#f92672"&gt;|&lt;/span&gt; in_clog_offset_bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; in_byte_offset_bitindex 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------+----------------------+-------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0002&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63196&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Rollback is used to roll back the transaction, mainly for easier observation, since most transactions are committed.
Checkpoint is to ensure the CLOG page is flushed — otherwise the CLOG page might still be in the CLOG buffer and not yet written to the CLOG segment file.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd pg_xact&lt;span style="color:#f92672"&gt;/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hexdump &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0002&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;s &lt;span style="color:#ae81ff"&gt;63196&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;n &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;v
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt;f6dc &lt;span style="color:#ae81ff"&gt;95&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt;f6dd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Convert hex to binary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;x96&amp;#39;&lt;/span&gt;::bit(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bit 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;10010110&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When xid%4=3, take bits 1 and 2. So the bit value for this rolled-back transaction is 10, where 10 represents &lt;code&gt;TRANSACTION_STATUS_ABORTED&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why CLOG Usually Contains Many 55s and U&amp;rsquo;s?
 &lt;div id="why-clog-usually-contains-many-55s-and-us" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-clog-usually-contains-many-55s-and-us" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In a typical transactional database CLOG file, a direct hexdump looks like this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hexdump &lt;span style="color:#f92672"&gt;-&lt;/span&gt;C &lt;span style="color:#ae81ff"&gt;0001&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;v&lt;span style="color:#f92672"&gt;|&lt;/span&gt;head &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000010&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000020&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000030&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000040&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000050&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000060&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;00000070&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;000000&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;000000&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;UUUUUUUUUUUUUUUU&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because the committed transaction state = 01 = &lt;code&gt;TRANSACTION_STATUS_COMMITTED&lt;/code&gt;. When 4 consecutive transactions in a byte are all committed, it becomes 01010101.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Binary: 01010101, hex: 55&lt;/li&gt;
&lt;li&gt;Hex 55 in ASCII is &amp;lsquo;U&amp;rsquo;, so when visually examining CLOG files you can generally see many U&amp;rsquo;s&lt;/li&gt;
&lt;li&gt;Occasionally some bytes are not 55 or U because in production environments some transactions occasionally haven&amp;rsquo;t completed or use subtransactions. The committed state of subtransactions in CLOG is 0x03.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Shared CLOG Buffer
 &lt;div id="shared-clog-buffer" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-clog-buffer" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The number of CLOG shared buffers is easy to understand:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Number of shared CLOG buffers.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * On larger multi-processor systems, it is possible to have many CLOG page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * requests in flight at one time which could lead to disk access for CLOG
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * page if the required page is not found in memory. Testing revealed that we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * can get the best performance by having 128 CLOG buffers, more than that it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * doesn&amp;#39;t improve performance.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Unconditionally keeping the number of CLOG buffers to 128 did not seem like
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * a good idea, because it would increase the minimum amount of shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * required to start, which could be a problem for people running very small
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * configurations. The following formula seems to represent a reasonable
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * compromise: people with very low values for shared_buffers will get fewer
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * CLOG buffers as well, and everyone else will get 128.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CLOGShmemBuffers&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;Min&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;, &lt;span style="color:#a6e22e"&gt;Max&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, NBuffers &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;512&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Translation: Testing has shown that 128 CLOG buffers provide the best performance — more than that doesn&amp;rsquo;t improve performance. However, because some database configurations are too small, 128 CLOG buffers seems a bit large, so it takes 1/512 of the shared_buffers count. In other words:
Number of CLOG buffers = 1/512 shared_buffer, minimum is 4, maximum is 128. Note: these are all buffer counts, not sizes!&lt;/p&gt;
&lt;p&gt;How large is a single buffer?
CLOG buffer is managed by SLRU, and each SLRU page is 8k:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;A page is the same BLCKSZ as is used everywhere&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;We can glimpse the size of shared CLOG buffer from the perspective of CLOG SLRU initialization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Initialization of shared memory for CLOG
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CLOGShmemSize&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;SimpleLruShmemSize&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;CLOGShmemBuffers&lt;/span&gt;(), CLOG_LSNS_PER_PAGE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The passed &lt;code&gt;CLOGShmemBuffers()&lt;/code&gt; is 4~128, and the passed &lt;code&gt;CLOG_LSNS_PER_PAGE&lt;/code&gt; = 1024 bytes (with 8k pages).
&lt;code&gt;SimpleLruShmemSize&lt;/code&gt; initializes SLRU shared memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SimpleLruShmemSize&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nslots, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nlsns)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Size		sz;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* we assume nslots isn&amp;#39;t so large as to risk overflow */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(SlruSharedData));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_buffer[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(SlruPageStatus));	&lt;span style="color:#75715e"&gt;/* page_status[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_dirty[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_number[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));	&lt;span style="color:#75715e"&gt;/* page_lru_count[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(LWLockPadded));	&lt;span style="color:#75715e"&gt;/* buffer_locks[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (nlsns &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		sz &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MAXALIGN&lt;/span&gt;(nslots &lt;span style="color:#f92672"&gt;*&lt;/span&gt; nlsns &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(XLogRecPtr));	&lt;span style="color:#75715e"&gt;/* group_lsn[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;BUFFERALIGN&lt;/span&gt;(sz) &lt;span style="color:#f92672"&gt;+&lt;/span&gt; BLCKSZ &lt;span style="color:#f92672"&gt;*&lt;/span&gt; nslots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;SLRU uses some arrays to store SLRU metadata and control information. The sz size is all roughly &lt;code&gt;data type * buffer count&lt;/code&gt;, and these are generally not very large. The main initialized memory is &lt;code&gt;BLCKSZ * nslots&lt;/code&gt;, i.e., &lt;code&gt;8k * (4~128) = (32k~1M)&lt;/code&gt;. So we can &lt;em&gt;roughly&lt;/em&gt; estimate that the shared CLOG buffer size is around 1M.&lt;/p&gt;

&lt;h2 class="relative group"&gt;CLOG WAL: Types, Writing, and Redo
 &lt;div id="clog-wal-types-writing-and-redo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-wal-types-writing-and-redo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When writing CLOG, is CLOG WAL log also written? If so, wouldn&amp;rsquo;t that mean lost CLOG could be restored by reapplying WAL logs to recover transaction states? Let&amp;rsquo;s explore the CLOG WAL writing and redo source code with these questions in mind.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Extend CLOG
 &lt;div id="extend-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#extend-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;ZeroCLOGPage&lt;/code&gt; writes WAL. &lt;code&gt;ZeroCLOGPage(pageno, true)&lt;/code&gt; is actually &lt;em&gt;only&lt;/em&gt; called by &lt;code&gt;ExtendCLOG&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Make sure that CLOG has room for a newly-allocated XID.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * NB: this is called while holding XidGenLock. We want it to be very fast
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * most of the time; even when it&amp;#39;s not so fast, no actual I/O need happen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * unless we&amp;#39;re forced to write out a dirty clog or xlog page to make room
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * in shared memory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExtendCLOG&lt;/span&gt;(TransactionId newestXact)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * No work except at first XID of a page. But beware: just after
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * wraparound, the first XID of page zero is FirstNormalTransactionId.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdToPgIndex&lt;/span&gt;(newestXact) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdEquals&lt;/span&gt;(newestXact, FirstNormalTransactionId))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	pageno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdToPage&lt;/span&gt;(newestXact); &lt;span style="color:#75715e"&gt;// CLOG page number converted from TransactionId
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(XactSLRULock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Zero the page and make an XLOG entry about it */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ZeroCLOGPage&lt;/span&gt;(pageno, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(XactSLRULock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;ZeroCLOGPage&lt;/code&gt; mainly calls &lt;code&gt;WriteZeroPageXlogRec&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Write a ZEROPAGE xlog record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WriteZeroPageXlogRec&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; pageno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogBeginInsert&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogRegisterData&lt;/span&gt;((&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) (&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;pageno), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;) &lt;span style="color:#a6e22e"&gt;XLogInsert&lt;/span&gt;(RM_CLOG_ID, CLOG_ZEROPAGE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;WriteZeroPageXlogRec&lt;/code&gt; is writing a WAL record, with type &amp;ldquo;RM_CLOG_ID, CLOG_ZEROPAGE&amp;rdquo;.
Using waldump, you can view CLOG_ZEROPAGE. Its proportion is generally very small:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump &lt;span style="color:#f92672"&gt;-&lt;/span&gt;z &lt;span style="color:#ae81ff"&gt;000000010000056&lt;/span&gt;B00000018 &lt;span style="color:#75715e"&gt;--stat=record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; N (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;) Record &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;) FPI &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;) Combined &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---- - --- ----------- --- -------- --- ------------- ---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CLOG&lt;span style="color:#f92672"&gt;/&lt;/span&gt;ZEROPAGE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;) &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; ( &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Extending CLOG page is always in page units. In fact, at the end of a CLOG segment you can easily see 00s:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hexdump 03C2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000 5555 5555 5555 5555 5555 5555 5555 5555
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;001bb30 5555 5555 0055 0000 0000 0000 0000 0000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;001bb40 0000 0000 0000 0000 0000 0000 0000 0000 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;* ## The end of the CLOG file is all zeros
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;001c000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Truncate CLOG
 &lt;div id="truncate-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#truncate-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Besides extending CLOG, there&amp;rsquo;s also truncating CLOG. Truncate CLOG is called during vacuum. When called, it writes a truncate CLOG WAL record and flushes the WAL record to disk:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Remove all CLOG segments before the one holding the passed transaction ID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Before removing any CLOG data, we must flush XLOG to disk, to ensure
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * that any recently-emitted FREEZE_PAGE records have reached disk; otherwise
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * a crash and restart might leave us with some unfrozen tuples referencing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * removed CLOG data. We choose to emit a special TRUNCATE XLOG record too.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Replaying the deletion from XLOG is not critical, since the files could
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * just as well be removed later, but doing so prevents a long-running hot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * standby server from acquiring an unreasonably bloated CLOG directory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Since CLOG segments hold a large number of transactions, the opportunity to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * actually remove a segment is fairly rare, and so it seems best not to do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * the XLOG flush unless we have confirmed that there is a removable segment.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;TruncateCLOG&lt;/span&gt;(TransactionId oldestXact, Oid oldestxid_datoid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			cutoffPage;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * The cutoff point is the start of the segment containing oldestXact. We
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * pass the *page* containing oldestXact to SimpleLruTruncate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// What&amp;#39;s written to WAL is the CLOG position, which is the CLOG page number converted from oldestXact
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cutoffPage &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdToPage&lt;/span&gt;(oldestXact); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;.....
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Write XLOG record and flush XLOG to disk. We record the oldest xid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * we&amp;#39;re keeping information about here so we can ensure that it&amp;#39;s always
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * ahead of clog truncation in case we crash, and so a standby finds out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the new valid xid before the next checkpoint.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// WriteTruncateXlogRec writes the corresponding WAL record and flushes it to disk
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;WriteTruncateXlogRec&lt;/span&gt;(cutoffPage, oldestXact, oldestxid_datoid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// After WAL is written, actually execute the CLOG segment truncation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Now we can remove the old CLOG segment(s) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;SimpleLruTruncate&lt;/span&gt;(XactCtl, cutoffPage);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;WriteTruncateXlogRec&lt;/code&gt; writes a WAL record with &lt;code&gt;RMGR&lt;/code&gt; as &lt;code&gt;RM_CLOG_ID&lt;/code&gt; and &lt;code&gt;info&lt;/code&gt; as &lt;code&gt;CLOG_TRUNCATE&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Write a TRUNCATE xlog record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * We must flush the xlog record to disk before returning --- see notes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * in TruncateCLOG().
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;WriteTruncateXlogRec&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; pageno, TransactionId oldestXact, Oid oldestXactDb)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	XLogRecPtr	recptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xl_clog_truncate xlrec;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xlrec.pageno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xlrec.oldestXact &lt;span style="color:#f92672"&gt;=&lt;/span&gt; oldestXact;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xlrec.oldestXactDb &lt;span style="color:#f92672"&gt;=&lt;/span&gt; oldestXactDb;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogBeginInsert&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogRegisterData&lt;/span&gt;((&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) (&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;xlrec), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(xl_clog_truncate));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	recptr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogInsert&lt;/span&gt;(RM_CLOG_ID, CLOG_TRUNCATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;XLogFlush&lt;/span&gt;(recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After generating CLOG WAL records, the redo recovery routine is also needed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * CLOG resource manager&amp;#39;s routines
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;clog_redo&lt;/span&gt;(XLogReaderState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When redo info type is CLOG_ZEROPAGE, place the read redo information in memory, then write to the CLOG page file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CLOG_ZEROPAGE)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			slotno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;pageno, &lt;span style="color:#a6e22e"&gt;XLogRecGetData&lt;/span&gt;(record), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(XactSLRULock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		slotno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;ZeroCLOGPage&lt;/span&gt;(pageno, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SimpleLruWritePage&lt;/span&gt;(XactCtl, slotno); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;XactCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;shared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;page_dirty[slotno]);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(XactSLRULock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When redo info type is CLOG_TRUNCATE, place the read redo information in memory, confirm the page is deletable (write page if not), then truncate the segment
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CLOG_TRUNCATE)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		xl_clog_truncate xlrec;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;xlrec, &lt;span style="color:#a6e22e"&gt;XLogRecGetData&lt;/span&gt;(record), &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(xl_clog_truncate));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * During XLOG replay, latest_page_number isn&amp;#39;t set up yet; insert a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * suitable value to bypass the sanity test in SimpleLruTruncate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		XactCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;shared&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;latest_page_number &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xlrec.pageno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;AdvanceOldestClogXid&lt;/span&gt;(xlrec.oldestXact);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SimpleLruTruncate&lt;/span&gt;(XactCtl, xlrec.pageno);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(PANIC, &lt;span style="color:#e6db74"&gt;&amp;#34;clog_redo: unknown op code %u&amp;#34;&lt;/span&gt;, info);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What the CLOG redo routine does:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When redo info type is &lt;code&gt;CLOG_ZEROPAGE&lt;/code&gt;: finds a suitable slot (evict if necessary), performs writability checks based on the read redo information (actually the CLOG page number), then writes the page to the CLOG file&lt;/li&gt;
&lt;li&gt;When redo info type is &lt;code&gt;CLOG_TRUNCATE&lt;/code&gt;: based on the read redo information (actually the CLOG page number), confirms the page is deletable (write page if not available), then truncates the CLOG segment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;CLOG Synchronization Summary
 &lt;div id="clog-synchronization-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#clog-synchronization-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CLOG has only two types of WAL logs, neither containing transaction status information. They are only triggered when extending CLOG pages and truncating CLOG segments, and the written WAL record is just a CLOG page number.
CLOG&amp;rsquo;s WAL log RMGR type has only one: &lt;code&gt;RM_CLOG_ID&lt;/code&gt;. This type has only two info codes: &lt;code&gt;CLOG_ZEROPAGE&lt;/code&gt;, &lt;code&gt;CLOG_TRUNCATE&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* XLOG stuff */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_ZEROPAGE 0x00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_TRUNCATE 0x10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CLOG WAL synchronization summary:
&lt;strong&gt;The standby database is essentially not synchronizing CLOG information — it&amp;rsquo;s only synchronizing some CLOG file expansion and deletion information.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;However, the standby&amp;rsquo;s CLOG file clearly does have status information, and the standby obviously needs this information for visibility checking. How is the transaction status in CLOG synchronized?&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction ID WAL: Types, Writing, and Redo
 &lt;div id="transaction-id-wal-types-writing-and-redo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-wal-types-writing-and-redo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The WAL for rmgr=CLOG doesn&amp;rsquo;t contain transaction status. Does the standby not synchronize CLOG transaction information? No — WAL logs do contain transaction ID status information, and CLOG is also updated:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Roll back a transaction, commit a transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; txid_current 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1817254&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; txid_current();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; txid_current 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1817258&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CHECKPOINT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_waldump to view transaction ID status in logs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[datalzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_wal]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pg_waldump ..&lt;span style="color:#f92672"&gt;/&lt;/span&gt;..&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_wal&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;000000010000007300000008&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#f92672"&gt;-&lt;/span&gt;E &lt;span style="color:#e6db74"&gt;&amp;#34;1817254|1817258&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: &lt;span style="color:#66d9ef"&gt;Transaction&lt;/span&gt; len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;1817254&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;ED210, prev &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;ED1E0, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;ABORT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;017612&lt;/span&gt; CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rmgr: &lt;span style="color:#66d9ef"&gt;Transaction&lt;/span&gt; len (rec&lt;span style="color:#f92672"&gt;/&lt;/span&gt;tot): &lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;, tx: &lt;span style="color:#ae81ff"&gt;1817258&lt;/span&gt;, lsn: &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;EEB08, prev &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;EEAD8, &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;042545&lt;/span&gt; CST
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_waldump: fatal: error &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; WAL record &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;F7C78: invalid record &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;400&lt;/span&gt;F7F88: wanted &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;, got &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The WAL records the status of transaction IDs (1817254, 1817258), recorded as &lt;code&gt;ABORT&lt;/code&gt; and &lt;code&gt;COMMIT&lt;/code&gt; respectively; rmgr is &lt;code&gt;Transaction&lt;/code&gt;.
Transaction ID status is in WAL logs, but does PostgreSQL write it to the standby&amp;rsquo;s CLOG?
Obviously, we need to find this redo information. Based on previous experience, &lt;code&gt;clog_redo&lt;/code&gt; represents the WAL redo source code for rmgr=CLOG. Searching the source for &lt;code&gt;_redo&lt;/code&gt; should find the WAL redo source code for rmgr=Transaction. Searching&amp;hellip; in &lt;code&gt;xact.c&lt;/code&gt; we find the function &lt;code&gt;xact_redo&lt;/code&gt;, which mainly calls &lt;code&gt;xact_redo_commit&lt;/code&gt; and &lt;code&gt;xact_redo_abort&lt;/code&gt;, clearly corresponding to WAL log application logic for committed and rolled-back transactions respectively.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;xact_redo&lt;/span&gt;(XLogReaderState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		info &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;XLogRecGetInfo&lt;/span&gt;(record) &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; XLOG_XACT_OPMASK;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Backup blocks are not used in xact records */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;XLogRecHasAnyBlockRefs&lt;/span&gt;(record));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; XLOG_XACT_COMMIT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;xact_redo_commit&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;parsed, &lt;span style="color:#a6e22e"&gt;XLogRecGetXid&lt;/span&gt;(record),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 record&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;EndRecPtr, &lt;span style="color:#a6e22e"&gt;XLogRecGetOrigin&lt;/span&gt;(record));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (info &lt;span style="color:#f92672"&gt;==&lt;/span&gt; XLOG_XACT_ABORT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;xact_redo_abort&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;parsed, &lt;span style="color:#a6e22e"&gt;XLogRecGetXid&lt;/span&gt;(record));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(PANIC, &lt;span style="color:#e6db74"&gt;&amp;#34;xact_redo: unknown op code %u&amp;#34;&lt;/span&gt;, info);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Taking commit as an example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;xact_redo_commit&lt;/span&gt;(xl_xact_parsed_commit &lt;span style="color:#f92672"&gt;*&lt;/span&gt;parsed,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 TransactionId xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 XLogRecPtr lsn,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 RepOriginId origin_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (standbyState &lt;span style="color:#f92672"&gt;==&lt;/span&gt; STANDBY_DISABLED)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Mark the transaction committed in pg_xact.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;TransactionIdCommitTree&lt;/span&gt;(xid, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nsubxacts, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxacts);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;// standby logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Mark the transaction committed in pg_xact. We use async commit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * protocol during recovery to provide information on database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * consistency for when users try to set hint bits. It is important
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * that we do not set hint bits until the minRecoveryPoint is past
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * this commit record. This ensures that if we crash we don&amp;#39;t see hint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * bits set on changes made by transactions that haven&amp;#39;t yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * recovered. It&amp;#39;s unlikely but it&amp;#39;s good to be safe.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Mark transaction committed in pg_xact
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;TransactionIdAsyncCommitTree&lt;/span&gt;(xid, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nsubxacts, parsed&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxacts, lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It looks like &lt;code&gt;TransactionIdAsyncCommitTree&lt;/code&gt; is the function we&amp;rsquo;re looking for that writes to CLOG.&lt;/p&gt;
&lt;p&gt;To verify the redo logic for transaction commit information in WAL, let&amp;rsquo;s set three breakpoints on the standby&amp;rsquo;s startup process, then execute &lt;code&gt;begin;select txid_current();commit;&lt;/code&gt; on the source database to commit a transaction, and see if the standby&amp;rsquo;s startup process hits the functions we want to see when doing redo:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(gdb) bt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 TransactionIdAsyncCommitTree (xid=xid@entry=1818665, nxids=0, xids=0x0, lsn=lsn@entry=495398394064) at transam.c:274
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x000000000050c139 in xact_redo_commit (parsed=parsed@entry=0x7ffda52c0fc0, xid=1818665, lsn=495398394064, origin_id=&amp;lt;optimized out&amp;gt;) at xact.c:5805
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x000000000050ffa3 in xact_redo (record=0x2b5ff2434038) at xact.c:5962
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x0000000000519ea5 in StartupXLOG () at xlog.c:7411
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000072f301 in StartupProcessMain () at startup.c:204
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x0000000000528701 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffda52c6ef0) at bootstrap.c:450
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x000000000072c459 in StartChildProcess (type=StartupProcess) at postmaster.c:5494
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x000000000072ec44 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x2b5ff242d1c0) at postmaster.c:1407
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 0x000000000048931f in main (argc=3, argv=0x2b5ff242d1c0) at main.c:210
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(gdb) info b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Num Type Disp Enb Address What
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; breakpoint keep y &lt;span style="color:#ae81ff"&gt;0x000000000050c060&lt;/span&gt; in xact_redo_commit at xact.c:&lt;span style="color:#ae81ff"&gt;5753&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint already hit &lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; times
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; breakpoint keep y &lt;span style="color:#ae81ff"&gt;0x0000000000508190&lt;/span&gt; in TransactionIdCommitTree at transam.c:&lt;span style="color:#ae81ff"&gt;262&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; breakpoint keep y &lt;span style="color:#ae81ff"&gt;0x00000000005081a0&lt;/span&gt; in TransactionIdAsyncCommitTree at transam.c:&lt;span style="color:#ae81ff"&gt;274&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; breakpoint already hit &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; time&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The breakpoint &lt;code&gt;TransactionIdAsyncCommitTree&lt;/code&gt; is hit, and &lt;code&gt;xid=1818665&lt;/code&gt;, which is the transaction ID just committed on the source database. This confirms the code logic we visually traced is correct.
So, &lt;strong&gt;the standby database&amp;rsquo;s CLOG transaction ID status is synchronized by WAL with rmgr=Transaction.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;CLOG only stores transaction ID status, not the transaction ID itself&lt;/li&gt;
&lt;li&gt;Transaction status in CLOG files can be manually located via the transaction ID&lt;/li&gt;
&lt;li&gt;WAL for rmgr=CLOG only extends and cleans up CLOG files, it does not update transaction status&lt;/li&gt;
&lt;li&gt;WAL for rmgr=Transaction updates CLOG transaction status&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&amp;ldquo;Quickly Mastering PostgreSQL Version New Features&amp;rdquo;, p24&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;Yan Shuli, PostgreSQL CLOG Analysis &lt;a href="https://www.modb.pro/db/606433" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/606433&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&amp;ldquo;PostgreSQL Database Kernel Analysis&amp;rdquo;, Chapter 7, p380-390&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title>PostgreSQL Case Study: Analysis of Abnormally Long Planning Time</title><link>https://lastdba.com/en/2024/08/21/postgresql-case-study-analysis-of-abnormally-long-planning-time/</link><pubDate>Wed, 21 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/21/postgresql-case-study-analysis-of-abnormally-long-planning-time/</guid><description>&lt;h2 class="relative group"&gt;Problem Analysis Overview
 &lt;div id="problem-analysis-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database kept OOMing. Analysis revealed the issue was in query plan generation: planning time ~1 second, planning shared hits ~1 million. After thorough investigation, the root cause was identified as bloat in the statistics base table &lt;code&gt;pg_statistic&lt;/code&gt;. On the first SQL execution of a session — due to a CatCacheMiss — the backend accessed and cached an excessive amount of dead tuple data from &lt;code&gt;pg_statistic&lt;/code&gt;. Application connections always spawned new sessions, and the combined memory usage across multiple backends was too large, leading to OOM.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Analysis Overview
 &lt;div id="problem-analysis-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The database kept OOMing. Analysis revealed the issue was in query plan generation: planning time ~1 second, planning shared hits ~1 million. After thorough investigation, the root cause was identified as bloat in the statistics base table &lt;code&gt;pg_statistic&lt;/code&gt;. On the first SQL execution of a session — due to a CatCacheMiss — the backend accessed and cached an excessive amount of dead tuple data from &lt;code&gt;pg_statistic&lt;/code&gt;. Application connections always spawned new sessions, and the combined memory usage across multiple backends was too large, leading to OOM.&lt;/p&gt;
&lt;p&gt;Below is the detailed analysis process.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A certain database kept OOMing and restarting. After investigation, we found that while the number of concurrent sessions wasn&amp;rsquo;t high, each session&amp;rsquo;s memory footprint was quite large. The total memory exceeded the cgroup memory limit, causing OOM.&lt;/p&gt;
&lt;p&gt;We could preliminarily rule out the following causes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not caused by excessive metadata. Too many objects (typically too many partitions) would cause sessions to cache excessive metadata. This database didn&amp;rsquo;t have that many objects.&lt;/li&gt;
&lt;li&gt;Not caused by SQL execution plan issues. Sorting/hash operations might use too much memory. This database didn&amp;rsquo;t fit that scenario — the SQL in question was a simple sequential scan.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;During the investigation, we discovered that any simple SQL query in this database took a very long time to execute, and Planning Buffers showed about 1 million hits:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlinfo &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;011&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;012&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlinfo (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;480&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;473&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;010&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;010&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1127312&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- Abnormal planning shared hit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;947&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038&lt;/span&gt; ms &lt;span style="color:#75715e"&gt;-- Abnormal planning time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;035&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Running the same SQL a second time, the planning time was normal.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Investigation Process
 &lt;div id="problem-investigation-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-investigation-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Printing Execution Plan Statistics
 &lt;div id="printing-execution-plan-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#printing-execution-plan-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;We enabled logging for each phase of the execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; log_parser_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; log_planner_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; log_executor_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then ran the SQL. The log output was as follows:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-13 10:02:33.936 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,85532,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,66babe8c.14e1c,13,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;PARSER STATISTICS&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0.000046 s user, 0.000046 s system, 0.000091 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! [0.001661 s user, 0.001661 s system total]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 4660 kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/8] filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/36 [0/996] page faults/reclaims, 0 [0] swaps
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [5/0] voluntary/involuntary context switches&amp;#34;&lt;/span&gt;,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-08-13 10:02:33.938 CST,&amp;#34;&lt;/span&gt;postgres&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;lzldb&lt;span style="color:#e6db74"&gt;&amp;#34;,85532,&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;,66babe8c.14e1c,14,&amp;#34;&lt;/span&gt;EXPLAIN&lt;span style="color:#e6db74"&gt;&amp;#34;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&amp;#34;&lt;/span&gt;PARSE ANALYSIS STATISTICS&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0.001459 s user, 0.000000 s system, 0.001464 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0.003146 s user, 0.001687 s system total&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;5972&lt;/span&gt; kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/8&lt;span style="color:#f92672"&gt;]&lt;/span&gt; filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/325 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/1324&lt;span style="color:#f92672"&gt;]&lt;/span&gt; page faults/reclaims, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; signals rcvd, 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;5/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; voluntary/involuntary context switches&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,&amp;#34;&lt;/span&gt;explain &lt;span style="color:#f92672"&gt;(&lt;/span&gt;analyze,buffers&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-13 10:02:33.938 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,85532,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,66babe8c.14e1c,15,&lt;span style="color:#e6db74"&gt;&amp;#34;EXPLAIN&amp;#34;&lt;/span&gt;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;REWRITER STATISTICS&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0.000001 s user, 0.000000 s system, 0.000001 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! [0.003177 s user, 0.001687 s system total]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 5972 kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/8] filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/1324] page faults/reclaims, 0 [0] swaps
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [5/0] voluntary/involuntary context switches&amp;#34;&lt;/span&gt;,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-08-13 10:02:34.644 CST,&amp;#34;&lt;/span&gt;postgres&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;lzldb&lt;span style="color:#e6db74"&gt;&amp;#34;,85532,&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;,66babe8c.14e1c,16,&amp;#34;&lt;/span&gt;EXPLAIN&lt;span style="color:#e6db74"&gt;&amp;#34;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&amp;#34;&lt;/span&gt;PLANNER STATISTICS&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0.539964 s user, 0.164083 s system, 0.705718 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0.543248 s user, 0.165770 s system total&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;745072&lt;/span&gt; kB max resident size -- Abnormal point
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/8&lt;span style="color:#f92672"&gt;]&lt;/span&gt; filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/184803 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/186157&lt;span style="color:#f92672"&gt;]&lt;/span&gt; page faults/reclaims, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; signals rcvd, 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;! 0/1 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;5/1&lt;span style="color:#f92672"&gt;]&lt;/span&gt; voluntary/involuntary context switches&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,&amp;#34;&lt;/span&gt;explain &lt;span style="color:#f92672"&gt;(&lt;/span&gt;analyze,buffers&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-13 10:02:34.644 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,85532,&lt;span style="color:#e6db74"&gt;&amp;#34;[local]&amp;#34;&lt;/span&gt;,66babe8c.14e1c,17,&lt;span style="color:#e6db74"&gt;&amp;#34;EXPLAIN&amp;#34;&lt;/span&gt;,2024-08-13 10:01:48 CST,4/713,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;EXECUTOR STATISTICS&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0.540248 s user, 0.164170 s system, 0.706088 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! [0.543532 s user, 0.165857 s system total]
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 745596 kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/0 [0/8] filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/184898 [0/186252] page faults/reclaims, 0 [0] swaps
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;! 0/1 [5/1] voluntary/involuntary context switches&amp;#34;&lt;/span&gt;,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select *,1 from lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;During the planner phase, memory usage skyrocketed and elapsed time also spiked. This pinpointed the issue to the planner phase within the overall planning stage. There wasn&amp;rsquo;t much else actionable from the stats.&lt;/p&gt;

&lt;h3 class="relative group"&gt;strace Tracing
 &lt;div id="strace-tracing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#strace-tracing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;strace -p &lt;span style="color:#ae81ff"&gt;76419&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;strace: Process &lt;span style="color:#ae81ff"&gt;76419&lt;/span&gt; attached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, [&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;EPOLLIN, &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;u32&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15422552&lt;/span&gt;, u64&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15422552&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}}&lt;/span&gt;], &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;Q\0\0\0\262explain (analyze,buffers) s&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;179&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xfed000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x100e000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mmap(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;270336&lt;/span&gt;, PROT_READ&lt;span style="color:#f92672"&gt;|&lt;/span&gt;PROT_WRITE, MAP_PRIVATE&lt;span style="color:#f92672"&gt;|&lt;/span&gt;MAP_ANONYMOUS, &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2b7806b0c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/16678&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/46160&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7667712&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/46168&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;188416&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;base/17076/46170&amp;#34;&lt;/span&gt;, O_RDWR) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;188416&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mmap(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;528384&lt;/span&gt;, PROT_READ&lt;span style="color:#f92672"&gt;|&lt;/span&gt;PROT_WRITE, MAP_PRIVATE&lt;span style="color:#f92672"&gt;|&lt;/span&gt;MAP_ANONYMOUS, &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2b78c1b36000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1007000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x102c000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1025000) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1025000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk(&lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x1025000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek(&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, SEEK_END) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7667712&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;open&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_stat_tmp/pgss_query_texts.stat&amp;#34;&lt;/span&gt;, O_RDWR&lt;span style="color:#f92672"&gt;|&lt;/span&gt;O_CREAT, &lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pwrite64(&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;explain (analyze,buffers) select&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;93934&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pwrite64(&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\0&amp;#34;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;94106&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;close&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\250\3\0\0\264B\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;936&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\10\1\0\0\264B\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\10\1\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\16\0\0\0H\0\0\0\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;T\0\0\0#\0\1QUERY PLAN\0\0\0\0\0\0\0\0\0\0\31\377\377\377\377&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;826&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;826&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xd2b4e0, &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; EAGAIN (Resource temporarily unavailable)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although there were many shared hits, strace didn&amp;rsquo;t reveal much. strace showed the session only opened 4 data files. Using fd and oid2name to look up the data files, they turned out to be: the table, two indexes on the table, and &lt;code&gt;pathman_config&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;From database &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filenode Table Name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;46170&lt;/span&gt; ix_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;46168&lt;/span&gt; pk_lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;46160&lt;/span&gt; lzlinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16678&lt;/span&gt; pathman_config&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;These objects are not large, so it didn&amp;rsquo;t look like oversized tables (or indexes) were the cause.&lt;/p&gt;

&lt;h3 class="relative group"&gt;perf
 &lt;div id="perf" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#perf" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;(No screenshot — use your imagination.)&lt;/p&gt;
&lt;p&gt;The perf flame graph showed ~40% of the time spent on the &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; stack.&lt;/p&gt;

&lt;h3 class="relative group"&gt;gdb
 &lt;div id="gdb" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gdb" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Using &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; as a clue, after multiple gdb sessions, we set the following breakpoints to investigate:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b relation_open
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b get_relation_info
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b RelationCacheInvalidateEntry 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b get_relname_relid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b AcceptInvalidationMessages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b RelationClearRelation
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b pg_hint_plan_planner
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;b heap_hot_search_buffer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When breakpoints first hit, there was a lot of noise — they were normal logic. But later, after execution reached a certain point, only &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; kept hitting:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Breakpoint 15, heap_hot_search_buffer &lt;span style="color:#f92672"&gt;(&lt;/span&gt;tid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;tid@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2313c60, relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2b2141663910, buffer&lt;span style="color:#f92672"&gt;=&lt;/span&gt;17045, snapshot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;snapshot@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x228a058, heapTuple&lt;span style="color:#f92672"&gt;=&lt;/span&gt;heapTuple@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x23273d0, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; all_dead&lt;span style="color:#f92672"&gt;=&lt;/span&gt;all_dead@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x7ffce272e28f, first_call&lt;span style="color:#f92672"&gt;=&lt;/span&gt;true&lt;span style="color:#f92672"&gt;)&lt;/span&gt; at heapam.c:1503
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1503&lt;/span&gt; in heapam.c
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Continuing.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Breakpoint 15, heap_hot_search_buffer &lt;span style="color:#f92672"&gt;(&lt;/span&gt;tid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;tid@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2313c60, relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x2b2141663910, buffer&lt;span style="color:#f92672"&gt;=&lt;/span&gt;96708, snapshot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;snapshot@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x228a058, heapTuple&lt;span style="color:#f92672"&gt;=&lt;/span&gt;heapTuple@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x23273d0, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; all_dead&lt;span style="color:#f92672"&gt;=&lt;/span&gt;all_dead@entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x7ffce272e28f, first_call&lt;span style="color:#f92672"&gt;=&lt;/span&gt;true&lt;span style="color:#f92672"&gt;)&lt;/span&gt; at heapam.c:1503
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1503&lt;/span&gt; in heapam.c&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Most arguments passed to &lt;code&gt;heap_hot_search_buffer&lt;/code&gt; remained unchanged — including the addresses of &lt;code&gt;relation&lt;/code&gt; and &lt;code&gt;heapTuple&lt;/code&gt; — only the &lt;code&gt;buffer&lt;/code&gt; parameter changed, indicating it was scanning the same relation.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;heapTuple&lt;/code&gt; contained table OID information. Let&amp;rsquo;s print it:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; p *heapTuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$46 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_len &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 968, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_self &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ip_blkid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bi_hi &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bi_lo &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7211&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;}&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ip_posid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;}&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_tableOid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 2619, -- This is useful
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_data &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x2b2155fced00&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;heap_hot_search_buffer&lt;/code&gt; was called with OID=2619. Looking up 2619 in &lt;code&gt;pg_class&lt;/code&gt;, it&amp;rsquo;s &lt;code&gt;pg_statistic&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; oid &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_statistic&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Accessing the statistics base table is expected — PG needs statistics to estimate costs when generating candidate execution plans.&lt;/p&gt;

&lt;h3 class="relative group"&gt;pg_statistic Bloat
 &lt;div id="pg_statistic-bloat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_statistic-bloat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Now that we&amp;rsquo;ve pinpointed &lt;code&gt;pg_statistic&lt;/code&gt;, let&amp;rsquo;s check its condition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dt&lt;span style="color:#f92672"&gt;+&lt;/span&gt; pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; List &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; relations
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Persistence &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------------+-------+----------+-------------+---------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_statistic &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; permanent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1036&lt;/span&gt; MB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;pg_statistic&amp;#39;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;-------+------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relnamespace &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reloftype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relowner &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relam &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relfilenode &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2619&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltablespace &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;132481&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reltuples &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4655&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_statistic&lt;/code&gt; is 1GB — certainly oversized. 132,481 blocks but only 4,655 rows — this is clearly table bloat. But even with bloat, does accessing statistics really require caching the entire &lt;code&gt;pg_statistic&lt;/code&gt; table? Logically, no — you only need the statistics for the specific table. And indeed, PG accesses &lt;code&gt;pg_statistic&lt;/code&gt; through its primary key index &lt;code&gt;pg_statistic_relid_att_inh_index&lt;/code&gt;. From the call stack below, we can see the composite primary key fields being passed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;bt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000086edbc &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; SearchCatCacheMiss (&lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x226ba80, nkeys&lt;span style="color:#f92672"&gt;=&lt;/span&gt;nkeys&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, hashValue&lt;span style="color:#f92672"&gt;=&lt;/span&gt;hashValue&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;853716409&lt;/span&gt;, hashIndex&lt;span style="color:#f92672"&gt;=&lt;/span&gt;hashIndex&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;, v1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v1&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, v2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v2&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; v3&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v3&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, v4&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v4&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; catcache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1368&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x000000000086fa82 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; SearchCatCacheInternal (v4&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, v3&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, v2&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, v1&lt;span style="color:#f92672"&gt;=&amp;lt;&lt;/span&gt;optimized &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;, nkeys&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x226ba80) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; catcache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1299&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; SearchCatCache3 (&lt;span style="color:#66d9ef"&gt;cache&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x226ba80, v1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v1&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, v2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v2&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, v3&lt;span style="color:#f92672"&gt;=&lt;/span&gt;v3&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; catcache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1183&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000880d70 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; SearchSysCache3 (cacheId&lt;span style="color:#f92672"&gt;=&lt;/span&gt;cacheId&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;, key1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;key1&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, key2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;key2&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, key3&lt;span style="color:#f92672"&gt;=&lt;/span&gt;key3&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; syscache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1145&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000000000874092 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; get_attavgwidth (relid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;relid&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;, attnum&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; lsyscache.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2991&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x00000000006a2d46 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; set_rel_width (root&lt;span style="color:#f92672"&gt;=&lt;/span&gt;root&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x2326600, rel&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel&lt;span style="color:#f92672"&gt;@&lt;/span&gt;entry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x21e8418) &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; costsize.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5516&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The call passes &lt;code&gt;relid=relid@entry=18767, attnum=1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,starelid,staattnum &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_statistic &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; starelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; starelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; staattnum 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132657&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;132658&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- lzlinfo has 10 columns total, each with a staattnum entry&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the ctid, we can see this data actually lives in just 2 blocks.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s access &lt;code&gt;pg_statistic&lt;/code&gt; via the composite primary key index. Even with data in only 2 blocks, it took 1 second to access with ~1 million (1,141,568) shared hits:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing,&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,starelid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_statistic &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; starelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18767&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pg_statistic_relid_att_inh_index &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; pg_catalog.pg_statistic (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;103&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;416&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1035&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: ctid, starelid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (pg_statistic.starelid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;18767&amp;#39;&lt;/span&gt;::oid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1141568&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- Abnormal
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;102&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;1035&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;802&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Accessing 10 rows in &lt;code&gt;pg_statistic&lt;/code&gt; via the index resulted in ~1M shared hits — roughly matching the ~1M planning shared hits from the original SQL. (Note: Planning Time here is minimal, meaning the issue is not in plan generation per se, but in the data access during planning.)&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index Dead Tuples
 &lt;div id="index-dead-tuples" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-dead-tuples" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If vacuum hasn&amp;rsquo;t truly &amp;ldquo;run properly&amp;rdquo;, index dead tuples still point to dead heap tuples.&lt;/p&gt;
&lt;p&gt;Refer to: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/137368881?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522172420012616800225589534%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&amp;amp;request_id=172420012616800225589534&amp;amp;biz_id=0&amp;amp;utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-2-137368881-null-null.nonecase&amp;amp;utm_term=%E8%86%A8%E8%83%80&amp;amp;spm=1018.2226.3001.4450" target="_blank" rel="noreferrer"&gt;From Very Slow Unique Index Scans to Index Bloat&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/16f28ad1a331.png" alt="image.png" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;autovacuum Not Reclaiming Dead Tuples
 &lt;div id="autovacuum-not-reclaiming-dead-tuples" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#autovacuum-not-reclaiming-dead-tuples" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;With such severe table bloat, shouldn&amp;rsquo;t autovacuum have reclaimed it?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;select * from pg_stat_all_tables where relname=&amp;#39;pg_statistic&amp;#39;\gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-[ RECORD 1 ]-------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relid | 2619
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schemaname | pg_catalog
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname | pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;seq_scan | 1 	 -- Very few sequential scans on pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;seq_tup_read | 4655
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idx_scan | 28715508 -- Many index scans on pg_statistic
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;idx_tup_fetch | 25150245
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_ins | 46
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_upd | 1292143 -- Lots of updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_del | 14
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_tup_hot_upd | 138448
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup | 4655
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup | 1496776
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_mod_since_analyze | 1292203
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_ins_since_vacuum | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_vacuum | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_autovacuum | 2024-08-16 20:34:15.045022+08 -- Note: autovacuum timestamp is recent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_analyze | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;last_autoanalyze | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vacuum_count | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;autovacuum_count | 144170
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;analyze_count | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;autoanalyze_count | 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Actually, autovacuum was constantly running on &lt;code&gt;pg_statistic&lt;/code&gt;, but the worker process may not have been visible because it finished quickly (having nothing to actually reclaim) and went back to naptime:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;show autovacuum_naptime ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; autovacuum_naptime 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1min&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It naps every 1 minute, and the logs show autovacuum info printed every 1 minute as well:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-16 21:05:15.267 CST,,,41080,,66bf4e87.a078,1,,2024-08-16 21:05:11 CST,27/166839,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzldb.pg_catalog.pg_statistic&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;pages: 0 removed, 132685 remain, 1 skipped due to pins, 0 skipped frozen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;tuples: 0 removed, 1501745 remain, 1497090 are dead but not yet removable, oldest xmin: 119329380
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;buffer usage: 265443 hits, 0 misses, 0 dirtied
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;system usage: CPU: user: 0.53 s, system: 0.17 s, elapsed: 3.38 s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;WAL usage: 1 records, 0 full page images, 233 bytes&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-08-16 21:05:17.474 CST,,,41080,,66bf4e87.a078,2,,2024-08-16 21:05:11 CST,27/166844,136438968,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzldb.public.lzlinfo&amp;#34;&amp;#34; system usage: CPU: user: 2.02 s, system: 0.00 s, elapsed: 2.08 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;1497090 are dead but not yet removable&lt;/code&gt; — although autovacuum was triggered, it didn&amp;rsquo;t reclaim any dead tuples at all. 1,497,090 dead tuples remained uncleaned.&lt;/p&gt;
&lt;p&gt;Investigating who held &lt;code&gt;oldest xmin: 119329380&lt;/code&gt;, we quickly identified a replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wal_status &lt;span style="color:#f92672"&gt;|&lt;/span&gt; safe_wal_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------+----------+-----------+--------+----------+-----------+--------+------------+--------+--------------+--------------+---------------------+------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slotslotlostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pgoutput &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17076&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;119329380&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;F9&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;A4970 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;F9&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;105&lt;/span&gt;F8778 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The slot&amp;rsquo;s &lt;code&gt;catalog_xmin=119329380&lt;/code&gt; matched the vacuum&amp;rsquo;s &lt;code&gt;oldest xmin: 119329380&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;active=f&lt;/code&gt; indicated that the replication link was already broken.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Fixing the Problem
 &lt;div id="fixing-the-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fixing-the-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Drop the replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_drop_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;slotslotlostname&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_drop_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then manually vacuum or wait 1 minute for autovacuum.&lt;/p&gt;
&lt;p&gt;Finally, open a brand-new session to verify the fix:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; psql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql (&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;help&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; help.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlinfo &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;023&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;025&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlinfo (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3802&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;473&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2578&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;605&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;098&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Planning time dropped from ~1 second to ~10 ms, and planning shared hits dropped from ~1M to ~2K. The problem was basically resolved.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Case Summary
 &lt;div id="case-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#case-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The replication link broke and the replication slot wasn&amp;rsquo;t cleaned up in time, leading to bloat in the &lt;code&gt;pg_statistic&lt;/code&gt; statistics base table. This caused each backend to be very slow when loading statistics for the first time and to read excessive pages into its local cache. Each backend&amp;rsquo;s cache exceeded normal levels (~2GB), and with multiple backends this led to OOM.&lt;/p&gt;
&lt;p&gt;The problem itself is simple — it was just the investigation that was convoluted. In short: bloat in the base table &lt;code&gt;pg_statistic&lt;/code&gt; caused excessive data access during the plan generation phase. Metadata base table bloat can cause other tricky problems too — until next time.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Average Is Over</title><link>https://lastdba.com/en/2024/08/13/book-notes-average-is-over/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/book-notes-average-is-over/</guid><description>&lt;h2 class="relative group"&gt;Why I Read This Book
 &lt;div id="why-i-read-this-book" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-i-read-this-book" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;In the final pages of &lt;em&gt;Elon Musk&lt;/em&gt;, the author briefly introduced two books by economist Tyler Cowen: &lt;em&gt;The Great Stagnation&lt;/em&gt; and &lt;em&gt;Average Is Over&lt;/em&gt;. &lt;em&gt;The Great Stagnation&lt;/em&gt; is about why America&amp;rsquo;s development has stalled over the past 40 years — something I&amp;rsquo;m naturally not that interested in. But &lt;em&gt;Average Is Over&lt;/em&gt; is not a study of history; it&amp;rsquo;s a perspective on future development, especially the impact of AI on human life.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Why I Read This Book
 &lt;div id="why-i-read-this-book" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-i-read-this-book" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;In the final pages of &lt;em&gt;Elon Musk&lt;/em&gt;, the author briefly introduced two books by economist Tyler Cowen: &lt;em&gt;The Great Stagnation&lt;/em&gt; and &lt;em&gt;Average Is Over&lt;/em&gt;. &lt;em&gt;The Great Stagnation&lt;/em&gt; is about why America&amp;rsquo;s development has stalled over the past 40 years — something I&amp;rsquo;m naturally not that interested in. But &lt;em&gt;Average Is Over&lt;/em&gt; is not a study of history; it&amp;rsquo;s a perspective on future development, especially the impact of AI on human life.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve always been interested in what human life will look like in the future. Recently, OpenAI has been hot, and it feels like the AI era is upon us. What changes will AI bring to our lives and work? Will social structures shift? Which jobs will gradually disappear? Which jobs will benefit?&lt;/p&gt;

&lt;h2 class="relative group"&gt;Chess
 &lt;div id="chess" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#chess" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The book spends a large portion (nearly half) discussing chess and computer programs. You can tell the author is definitely a chess enthusiast — he&amp;rsquo;s deeply knowledgeable about chess history and its evolution. Reading this section always reminds me of &lt;em&gt;The Queen&amp;rsquo;s Gambit&lt;/em&gt;. If it weren&amp;rsquo;t for that show, I wouldn&amp;rsquo;t have known chess had rapid formats or that the Soviet Union was the world&amp;rsquo;s strongest chess nation. The author also uses chess to explore the influence of computer programs on the game.&lt;/p&gt;
&lt;p&gt;This influence goes beyond AlphaGo defeating the world&amp;rsquo;s strongest human Go player — the &amp;ldquo;beating the brightest human minds&amp;rdquo; kind of impact. It also includes how early chess programs changed the way humans learn chess. In the early days of chess, before computers took off, people could only learn chess from other people. A beginner couldn&amp;rsquo;t often play against a chess master. But as computer programs became widespread, they were adopted en masse. Chess programs could teach you, you could play against them, and you could even set the difficulty level. This was incredibly convenient for beginners. Without us even noticing, computer programs quietly reshaped our lives. In the future, we will increasingly collaborate with AI.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Polarization
 &lt;div id="polarization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#polarization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Once AI is widely deployed, many aspects of our lives will change. AI is unlikely to revolutionarily overturn the social structure of rich and poor; the reality that a tiny minority controls the vast majority of wealth may intensify further. The middle class is perhaps the most vulnerable stratum. Many middle-class workers perform partially intellectual but repetitive work — exactly AI&amp;rsquo;s sweet spot. The book argues that the value of middle-class work isn&amp;rsquo;t actually that great and may be relatively easily replaced. Disparities in basic assets will widen the gap in wealth accumulation — in other words, differences in starting capital will amplify differences in asset accumulation. In this age, that sentence is easy to understand.&lt;/p&gt;
&lt;p&gt;The book approaches wealth distribution from an American perspective, but it&amp;rsquo;s easy to map onto the Chinese context. China&amp;rsquo;s economic development over recent decades has been truly remarkable — the dividends of population and infrastructure construction, a phase all developed nations went through. But the introduction of market economics and the passage of time have been accompanied by growing wealth inequality. Let&amp;rsquo;s leave it there&amp;hellip; I don&amp;rsquo;t want to write anything too sensitive&amp;hellip;&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Rising Cost of Learning
 &lt;div id="the-rising-cost-of-learning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-rising-cost-of-learning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The cost of learning keeps rising. This doesn&amp;rsquo;t refer to the cost of tuition or training courses, but the difficulty of learning or mastering a profession. The word &amp;ldquo;inventor&amp;rdquo; — I suspect many people haven&amp;rsquo;t heard it in a long time. Our impression of the term is still stuck in the Edison era. Back then, individuals could invent things on their own; they just needed some relatively advanced knowledge in their field and a bit of brainpower. &amp;ldquo;Inventing&amp;rdquo; didn&amp;rsquo;t seem that hard. But as time passed, we rarely hear the word &amp;ldquo;inventor&amp;rdquo; anymore. It&amp;rsquo;s not that humans have stopped inventing — it&amp;rsquo;s that what people invent now is almost always the work of a team, many people, often requiring cross-disciplinary collaboration among multiple specialists. The cost of &amp;ldquo;inventing&amp;rdquo; things keeps rising because the knowledge required to master a field grows ever larger and more complex. It&amp;rsquo;s unrealistic for one person to master an entire industry; people tend to specialize in narrower domains — and even a narrow domain is enough for a lifetime of study.&lt;/p&gt;
&lt;p&gt;Academia today faces this exact situation. A relatively successful paper typically requires experts from various fields to use their specialized knowledge to verify the correctness of one small segment of a proof. The book gives a classic example: if a mathematician proves a conjecture in mathematics, there may be only a handful of people in the entire world who can truly understand what the mathematician is proving. Most of them may only understand one section of the content — and even the mathematician themselves may only say: &amp;ldquo;I might be right.&amp;rdquo; We have no way to verify the correctness of the proof.&lt;/p&gt;
&lt;p&gt;Human knowledge is becoming increasingly complex. Scientists now tend to, and increasingly do, hand calculations and experiments over to machines. Humanity seems to have reached a tipping point: our brains are nearly incapable of understanding this knowledge anymore. From a biological perspective, the human brain necessarily has a limit. The processing speed of the human brain can&amp;rsquo;t remotely keep up with machines.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Self-Learning
 &lt;div id="self-learning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#self-learning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Even as learning costs rise, education will become ever more important in the future. The education system may change. Since time will be more precious in the future world and learning resources will be easier to access, people will lean toward online learning and self-directed learning. At the same time, this makes self-drive even more critical.&lt;/p&gt;
&lt;p&gt;As an IT professional, I have a deep appreciation for self-learning. This industry is intensely competitive; if you don&amp;rsquo;t keep learning, you&amp;rsquo;re basically on the brink of obsolescence. But highly effective learning is also reflected in your salary. Our parents&amp;rsquo; generation relied on assigned jobs and could work in one position for decades without major changes. People back then just thought about working, not obsessively self-improving and chasing certifications. Times have truly changed. How many people, like me, are still writing articles at 11 PM? I&amp;rsquo;m even baffled by industries where you don&amp;rsquo;t need to keep learning after graduation — just how backward are they? You graduate university in your early twenties and still have decades to learn. It would be utterly strange to just stagnate there. Of course, I don&amp;rsquo;t like cutthroat competition, but I like standing still even less — especially in an age where just sitting on a stool spacing out causes the wealth gap to widen.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Finally
 &lt;div id="finally" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#finally" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The cover and illustrations for this post are all AI-generated. I just typed in &amp;ldquo;goodbye age of mediocrity&amp;rdquo; and the AI produced astonishing images. I don&amp;rsquo;t know exactly which industries or professions will disappear in the future, but at the very least, illustrators are going to have a hard time surviving in the AI era.&lt;/p&gt;
&lt;p&gt;AI has already invaded the IT domain. As a DBA, which of our work patterns will be replaced? That&amp;rsquo;s a question worth pondering. Whatever happens, in this age, only learning can keep you competitive. I hope none of us will be the &amp;ldquo;disappearing shoulder pole porter.&amp;rdquo;&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Educated and Atomic Habits</title><link>https://lastdba.com/en/2024/08/13/book-notes-educated-and-atomic-habits/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/book-notes-educated-and-atomic-habits/</guid><description>&lt;p&gt;I&amp;rsquo;ve actually wanted to write about these two books for a long time. I love reading, but I absolutely detest writing. Maintaining a blog is practically a miracle for me. I love reading because I love and believe in the power of education. As for how much I hate writing, let me tell you a little story.&lt;/p&gt;

&lt;h2 class="relative group"&gt;I Hate Writing Essays
 &lt;div id="i-hate-writing-essays" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#i-hate-writing-essays" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;My dislike for writing is practically innate. Since elementary school, I never wrote diaries or essays. Every winter and summer break homework required a daily diary entry — I never wrote a single one. I still remember when school started and I had to turn in summer homework. The teacher threatened that if I didn&amp;rsquo;t finish it, I couldn&amp;rsquo;t attend class. I still wrote nothing and just sat in the classroom as usual. Later, for some assignment, our homeroom teacher had to submit examples of correcting typically flawed students — only 4 or 5 kids in the class were selected. One was bad at sports, one had a temper problem&amp;hellip; and I was the one bad at writing! And the task was to write an essay about correcting that flaw! I can&amp;rsquo;t write — why would you make me write an essay about fixing my inability to write??? I dragged it out for two weeks. All the other students turned theirs in. I couldn&amp;rsquo;t squeeze out a single word. The homeroom teacher personally coached me. She said: when you walk down the street, you can turn anything you see into a sentence. See a blue sky? You can form a sentence in your mind: &amp;ldquo;The sky is cloudless for miles.&amp;rdquo; Practicing sentence construction regularly can help with writing. You could also think of other ways to fix your writing aversion. Another week passed, and I wrote down exactly what she told me, verbatim. I could see the frustration in her eyes. Later, in middle school, I cleverly befriended the Chinese class representative so she&amp;rsquo;d leave my name off the missing-homework list. That&amp;rsquo;s how I dodged three years of middle school. Then in high school, I once awkwardly wrote an essay my own way and scored 30 out of 60 — a devastating blow. So for every monthly exam, I simply didn&amp;rsquo;t write the essay. I figured: it&amp;rsquo;s just mock exams, not the Gaokao — I&amp;rsquo;ll just forfeit those 60 points. Finally, for the actual Gaokao and the two mock exams before it, I crammed Qu Yuan and Li Bai into essay templates like eight-legged essays. I found there was nothing you couldn&amp;rsquo;t cram them into, and I muddled through the Gaokao essay hurdle. College? No need to mention it — my hand had forgotten how to hold a pen.&lt;/p&gt;</description><content:encoded>&lt;p&gt;I&amp;rsquo;ve actually wanted to write about these two books for a long time. I love reading, but I absolutely detest writing. Maintaining a blog is practically a miracle for me. I love reading because I love and believe in the power of education. As for how much I hate writing, let me tell you a little story.&lt;/p&gt;

&lt;h2 class="relative group"&gt;I Hate Writing Essays
 &lt;div id="i-hate-writing-essays" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#i-hate-writing-essays" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;My dislike for writing is practically innate. Since elementary school, I never wrote diaries or essays. Every winter and summer break homework required a daily diary entry — I never wrote a single one. I still remember when school started and I had to turn in summer homework. The teacher threatened that if I didn&amp;rsquo;t finish it, I couldn&amp;rsquo;t attend class. I still wrote nothing and just sat in the classroom as usual. Later, for some assignment, our homeroom teacher had to submit examples of correcting typically flawed students — only 4 or 5 kids in the class were selected. One was bad at sports, one had a temper problem&amp;hellip; and I was the one bad at writing! And the task was to write an essay about correcting that flaw! I can&amp;rsquo;t write — why would you make me write an essay about fixing my inability to write??? I dragged it out for two weeks. All the other students turned theirs in. I couldn&amp;rsquo;t squeeze out a single word. The homeroom teacher personally coached me. She said: when you walk down the street, you can turn anything you see into a sentence. See a blue sky? You can form a sentence in your mind: &amp;ldquo;The sky is cloudless for miles.&amp;rdquo; Practicing sentence construction regularly can help with writing. You could also think of other ways to fix your writing aversion. Another week passed, and I wrote down exactly what she told me, verbatim. I could see the frustration in her eyes. Later, in middle school, I cleverly befriended the Chinese class representative so she&amp;rsquo;d leave my name off the missing-homework list. That&amp;rsquo;s how I dodged three years of middle school. Then in high school, I once awkwardly wrote an essay my own way and scored 30 out of 60 — a devastating blow. So for every monthly exam, I simply didn&amp;rsquo;t write the essay. I figured: it&amp;rsquo;s just mock exams, not the Gaokao — I&amp;rsquo;ll just forfeit those 60 points. Finally, for the actual Gaokao and the two mock exams before it, I crammed Qu Yuan and Li Bai into essay templates like eight-legged essays. I found there was nothing you couldn&amp;rsquo;t cram them into, and I muddled through the Gaokao essay hurdle. College? No need to mention it — my hand had forgotten how to hold a pen.&lt;/p&gt;
&lt;p&gt;Yes, with this peculiar writing psychology, I hated essays. But after entering the workforce, I gradually understood: the dullest pencil is better than the sharpest memory. No matter how many books you read, you need to internalize them. The pressures of ambition, family, and work forced me to change. Whether I needed this skill or not didn&amp;rsquo;t seem to matter — if society needs it, I should try to adapt. Writing not only pushes you forward, it&amp;rsquo;s also a way to record growth, to record life. Even my own technical articles — years later, I still have to come back and read them carefully, review them carefully.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Reading Originals Is Good for Body and Mind
 &lt;div id="reading-originals-is-good-for-body-and-mind" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reading-originals-is-good-for-body-and-mind" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I read both &lt;em&gt;Educated&lt;/em&gt; and &lt;em&gt;Atomic Habits&lt;/em&gt; in their original English. I had a bit of an English foundation, and since I was preparing for graduate school entrance exams at the time, I wanted to improve my English reading — so I chose English originals. At first, reading English originals was quite difficult. Many words were unfamiliar, and I&amp;rsquo;d look them up and annotate them in the book. Progress was painfully slow. But as I read deeper, there were fewer and fewer annotations. It wasn&amp;rsquo;t that I quickly memorized many new words — rather, some important words appear repeatedly throughout a book, while others that appear rarely don&amp;rsquo;t affect comprehension. Also, at the start you don&amp;rsquo;t know what the book is about, so comprehension latency is high. Later, once you know where it&amp;rsquo;s headed, reading naturally speeds up. For instance, the word &amp;ldquo;ridge&amp;rdquo; appeared very frequently early on, and I eventually remembered it. Some similar words I still can&amp;rsquo;t remember, but I know they&amp;rsquo;re some kind of geographic term — summit, valley, ridge — and even without remembering them precisely, it doesn&amp;rsquo;t stop me from reading. That&amp;rsquo;s how English originals work: difficult at first, faster the further you go.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Educated
 &lt;div id="educated" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#educated" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The Chinese title of &lt;em&gt;Educated&lt;/em&gt; is &lt;em&gt;You Should Fly Like a Bird to Your Mountain&lt;/em&gt; — I really want to complain about this title. &amp;ldquo;Educated&amp;rdquo; is the spiritual essence of the entire book, and near the end, the author drives it home with &amp;ldquo;call it educated&amp;rdquo; — absolutely brilliant. This Chinese title is like shit, completely missing the book&amp;rsquo;s essence. The author&amp;rsquo;s personal experience is legendary: a child who walked out of some corner of the American mountains, who through sheer effort studied all the way to Cambridge. Her father was uneducated, anti-social, lacking basic physics knowledge — his ignorance led to family members getting injured or even disabled. He disapproved of children going to school, even believing education was government brainwashing. Countless absurd behaviors. Her brother also had personality issues — he shoved her head into a filthy toilet and made her beg for mercy, then the next day acted like nothing happened and continued being her &amp;ldquo;good brother&amp;rdquo;&amp;hellip; Later, the author found her way out through education, and in the end, she didn&amp;rsquo;t want to return to that valley. Reading the ending always reminds me of my own experience. Of course, I didn&amp;rsquo;t have such an extreme environment, nor such a legendary journey, but I feel like I can understand — after being educated, family interactions somehow feel unnatural. It&amp;rsquo;s not about getting cocky after university — the generation gap is real. I deeply believe in the importance of education. If my family hadn&amp;rsquo;t sold everything they had to fully support my education, our circumstances would never have changed. If you&amp;rsquo;ve truly been mired in poverty, you know how fierce the desire to escape it is — and education is almost the only way out for people like us. &lt;em&gt;Educated&lt;/em&gt; is a great book: clear prose, comfortable sentence structure, suited to modern reading rhythms, a gripping story, a profound theme. It&amp;rsquo;s an excellent choice as your first English original.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Atomic Habits
 &lt;div id="atomic-habits" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#atomic-habits" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Atomic Habits&lt;/em&gt; — I&amp;rsquo;ve forgotten exactly how I found this book, but it changed my understanding of behavior. Building good habits isn&amp;rsquo;t actually that hard; most people just don&amp;rsquo;t know how. Many have said: I&amp;rsquo;ll read X books in a few months, run Y kilometers, lose Z pounds — but they rarely follow through. Building good habits requires genuinely liking the habit, changing your mindset, reducing the friction of the action, putting obstacles farther away, forming reward mechanisms, and so on. When you want to become a certain kind of person, don&amp;rsquo;t focus on how to become that person — think about what that kind of person &lt;em&gt;does&lt;/em&gt;, and learn to do it. For example, quitting smoking: if your brain thinks you&amp;rsquo;re &amp;ldquo;in the process of quitting,&amp;rdquo; it&amp;rsquo;s very hard. If someone offers you a cigarette and you say &amp;ldquo;I&amp;rsquo;m quitting,&amp;rdquo; a few words from them might get you to smoke. But if you genuinely believe you&amp;rsquo;re someone who &amp;ldquo;doesn&amp;rsquo;t smoke&amp;rdquo; — note, this must be your authentic inner belief — when someone offers you a cigarette, you&amp;rsquo;ll simply say &amp;ldquo;I don&amp;rsquo;t smoke,&amp;rdquo; and you probably won&amp;rsquo;t have to smoke it. Some small details: say you want to build a habit of reading at night — you need to break the habit of scrolling on your phone. Move your books from the bookshelf to your bedside for easier access. Put your phone at the foot of the bed, making getting up the barrier to grabbing the phone — this makes it easier to reach for the book instead of the phone. If picking up the book is still hard, reframe your thinking: &amp;ldquo;reading&amp;rdquo; as an action may feel difficult, but break it down — &amp;ldquo;pick up the book&amp;rdquo; or &amp;ldquo;open to the first page&amp;rdquo; becomes your mental target. The startup action for reading is simple and easy to complete. After reading the first page, think about what comes next — and in reality, once you&amp;rsquo;ve read the first page, it&amp;rsquo;s hard not to read the second. Of course, there are many more excellent suggestions for building good habits and shedding bad ones — every word is a gem, thoroughly engaging. After reading &lt;em&gt;Atomic Habits&lt;/em&gt;, whenever I want a certain habit, I first consider the book&amp;rsquo;s guidance, then plan how to implement it — rather than acting on impulse.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Finally
 &lt;div id="finally" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#finally" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;At last — these two books have had an enormous impact on me. One is a legendary autobiography; the other is a behavior-transforming book. Neither is the kind of work you forget shortly after reading. They&amp;rsquo;re perfect starter books for cultivating a reading habit, especially for those wanting to read English originals. I really don&amp;rsquo;t recommend &lt;em&gt;Pride and Prejudice&lt;/em&gt; or &lt;em&gt;One Hundred Years of Solitude&lt;/em&gt; — yes, they&amp;rsquo;re classics, but their impact on the reader is quite low, and they were written so long ago that some vocabulary and grammar are too archaic, making them unsuitable for first-time English readers. Looking at this through the lens of &lt;em&gt;Atomic Habits&lt;/em&gt;: reading these English classics is not only more difficult but also lacks immediate personal benefit, making it hard to form a habit.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Elon Musk</title><link>https://lastdba.com/en/2024/08/13/book-notes-elon-musk/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/book-notes-elon-musk/</guid><description>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9e00679bd411.png" alt="abc" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Gifted
 &lt;div id="gifted" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gifted" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Musk&amp;rsquo;s ancestors, driven by a love of adventure, emigrated from America to South Africa. His maternal grandfather even flew a plane from Africa to Australia. Musk was born in South Africa and showed astonishing memory and brilliance from an early age. His mother, Maye Musk, told his teacher: &amp;ldquo;My son is a genius.&amp;rdquo; The teacher replied, &amp;ldquo;Yes, every mother says that.&amp;rdquo; Maye: &amp;ldquo;No, I mean he really is a genius.&amp;rdquo; As a child, Musk sometimes seemed &amp;ldquo;slow to react.&amp;rdquo; His mother said when people talked to him, he&amp;rsquo;d give no response at all. She thought something was wrong with his brain and even took him to a doctor. But later she discovered Musk was simply immersed in his own world of thought. As a child, Musk could even finish reading the entire library&amp;rsquo;s collection and then ask the library to get more books&amp;hellip;&lt;/p&gt;</description><content:encoded>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9e00679bd411.png" alt="abc" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Gifted
 &lt;div id="gifted" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gifted" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Musk&amp;rsquo;s ancestors, driven by a love of adventure, emigrated from America to South Africa. His maternal grandfather even flew a plane from Africa to Australia. Musk was born in South Africa and showed astonishing memory and brilliance from an early age. His mother, Maye Musk, told his teacher: &amp;ldquo;My son is a genius.&amp;rdquo; The teacher replied, &amp;ldquo;Yes, every mother says that.&amp;rdquo; Maye: &amp;ldquo;No, I mean he really is a genius.&amp;rdquo; As a child, Musk sometimes seemed &amp;ldquo;slow to react.&amp;rdquo; His mother said when people talked to him, he&amp;rsquo;d give no response at all. She thought something was wrong with his brain and even took him to a doctor. But later she discovered Musk was simply immersed in his own world of thought. As a child, Musk could even finish reading the entire library&amp;rsquo;s collection and then ask the library to get more books&amp;hellip;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Arriving in America
 &lt;div id="arriving-in-america" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#arriving-in-america" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Due to the less-than-ideal environment in South Africa, Musk, approaching university age, executed a two-step jump. He first went to university in Canada, then to the United States for his master&amp;rsquo;s. Upon finally reaching America, Musk immersed himself in Silicon Valley&amp;rsquo;s work environment. The tech industry desperately needed young people like him — brilliant and relentless. And Silicon Valley&amp;rsquo;s tech atmosphere and culture of freely exercising one&amp;rsquo;s talents let Musk dive in completely.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Zip2 and PayPal
 &lt;div id="zip2-and-paypal" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#zip2-and-paypal" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Soon Musk founded Zip2, essentially a corporate version of online maps. While we&amp;rsquo;re now very familiar with online maps, the US internet industry was just getting started back then — this was all novel stuff. After many twists and turns, Zip2 did grow. Personally, I think Zip2&amp;rsquo;s model would have struggled to survive long-term without pivoting toward online maps or something like Yelp. Eventually, some sucker bought Zip2 for $300 million, instantly turning Musk into a multimillionaire and Silicon Valley tech tycoon. You could actually tell — Zip2 was deeply divided internally, had directional problems, and Musk didn&amp;rsquo;t have absolute decision-making power. He probably wanted out long ago.&lt;/p&gt;
&lt;p&gt;Before leaving Zip2, Musk was already planning and recruiting for online payments. At that time, the world didn&amp;rsquo;t even have anything like Alipay&amp;hellip; Musk believed traditional finance was too conservative and that there was enormous opportunity to change the industry model. But many bankers didn&amp;rsquo;t believe internet finance could work, because internet finance couldn&amp;rsquo;t handle network security issues — after all, the slightest error in finance could have enormous consequences. Initially, the company Musk founded wasn&amp;rsquo;t PayPal but X.com, which later merged with PayPal and kept the latter&amp;rsquo;s name. Early on, X.com suffered massive attacks but survived. Their security mechanisms at the time had a significant influence on the later online payments industry. PayPal was later acquired by eBay, netting Musk hundreds of millions of dollars — another huge payday.&lt;/p&gt;

&lt;h2 class="relative group"&gt;SpaceX and Tesla
 &lt;div id="spacex-and-tesla" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#spacex-and-tesla" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Zip2 and PayPal were, for Musk, validation of his industry sensitivity and business acumen — though some questioned his execution and decision-making abilities, i.e., his CEO chops. As always, Musk viewed these industries as too conservative and old-fashioned. Musk loved recruiting extremely capable top university graduates and disliked hiring seasoned, conservative-minded industry veterans. He ran both companies simultaneously, and for a long time, neither company produced any product at all. And, as you&amp;rsquo;d imagine, rocket-building burns through money like nothing else. After several failed rocket launches, Musk deployed his signature skill: fire&amp;hellip; And just as the financial crisis hit and no one wanted to invest, he poured his entire personal fortune into both companies. After several failures, SpaceX&amp;rsquo;s Falcon rocket finally achieved the feat of being the first private company to successfully launch a satellite, landing a $1 billion NASA contract. Tesla, after shamelessly asking early Roadster customers for more money (because developing such a radically new-concept EV cost far more than projected), finally produced a finished vehicle and built out a highway EV charging network and an electric car factory. After simultaneously succeeding with two industry-disrupting companies, no one questioned Musk&amp;rsquo;s ability anymore.&lt;/p&gt;

&lt;h2 class="relative group"&gt;For Humanity
 &lt;div id="for-humanity" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#for-humanity" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Musk&amp;rsquo;s success is inseparable from his excellent qualities: sensitivity to future technology, rapid comprehension of new industries, talent identification, a free and open tech and market environment, long working hours and execution&amp;hellip; But the things people dislike include his ruthlessness toward employees — some loyal, devoted people, fired just like that. As a worker myself, I deeply understand the feeling of giving your all without recognition from the company. Reading this book, I could even feel how American capitalists truly exploit workers. Once, an employee missed a company gathering because he didn&amp;rsquo;t want to miss his daughter&amp;rsquo;s birth. Musk emailed him an angry tirade: do you want to wallow in domestic trivialities or work relentlessly to change the world? The guy just didn&amp;rsquo;t want to miss his daughter&amp;rsquo;s birth.&lt;/p&gt;
&lt;p&gt;A few years ago, reading &lt;em&gt;Steve Jobs&lt;/em&gt;, I thought: how could someone be so obsessive? But that exact kind of person changed the mobile industry and brought about the smartphone revolution. Jobs was way too formidable. After reading &lt;em&gt;Elon Musk&lt;/em&gt;, I now feel Musk is even stronger than Jobs. Tesla, SpaceX, SolarCity — they&amp;rsquo;re all oriented toward humanity&amp;rsquo;s future. The future world seems to have started its engines; you can see it slowly arriving.&lt;/p&gt;
&lt;p&gt;Musk&amp;rsquo;s Mars plan finally seems to have glimpsed some dawn. For decades, the American space industry had nearly stagnated. He brought a new model and once again made aerospace a hot field. But there are also uncertainties. If a crewed launch explodes and causes casualties, SpaceX could plunge back into the abyss. And if Tesla discovers a serious defect requiring a mass recall, the stock price would crash.&lt;/p&gt;
&lt;p&gt;If you could be the first human to set foot on Mars, would you do it? Musk has thought about it, and he truly could become that person. But Musk wouldn&amp;rsquo;t do it. The book&amp;rsquo;s original words: I want to go, but I don&amp;rsquo;t have to. The point is to enable many people to go to Mars. It would be like the head of Boeing being a test pilot — for space exploration, that&amp;rsquo;s unwise. Even never going to space is fine. The point is to extend the lifespan of humanity as much as possible.&lt;/p&gt;
&lt;p&gt;Working for humanity — this theme truly stirs the heart. I&amp;rsquo;ve played &lt;em&gt;Civilization VI&lt;/em&gt; for days and nights on end, from stick-wielding primitives to igniting rockets, all for that moment of launch, when humanity becomes an interplanetary species and builds a new home on Mars!&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Rich Dad Poor Dad</title><link>https://lastdba.com/en/2024/08/13/book-notes-rich-dad-poor-dad/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/book-notes-rich-dad-poor-dad/</guid><description>&lt;p&gt;&lt;em&gt;Rich Dad Poor Dad&lt;/em&gt;. I used to scoff at this kind of book. It&amp;rsquo;s the type of success-literature you see displayed at bookstore entrances, looking insubstantial at a glance, very unreliable — the kind of thing that seems to prey on people at the bottom of society who dream of getting rich quick but can never actually apply the book&amp;rsquo;s advice due to their own circumstances or environment. Besides, smart people don&amp;rsquo;t read such uncultured books, right? The title is tacky as hell!&lt;/p&gt;</description><content:encoded>&lt;p&gt;&lt;em&gt;Rich Dad Poor Dad&lt;/em&gt;. I used to scoff at this kind of book. It&amp;rsquo;s the type of success-literature you see displayed at bookstore entrances, looking insubstantial at a glance, very unreliable — the kind of thing that seems to prey on people at the bottom of society who dream of getting rich quick but can never actually apply the book&amp;rsquo;s advice due to their own circumstances or environment. Besides, smart people don&amp;rsquo;t read such uncultured books, right? The title is tacky as hell!&lt;/p&gt;
&lt;p&gt;I spent a period watching &lt;em&gt;Lao Gao and Xiao Mo&lt;/em&gt; on Bilibili, and one episode talked about this book, making it sound mystical and mysterious. Plus, it&amp;rsquo;s a global bestseller, so I bought it to see just how magical it really was.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Started Reading E-Books
 &lt;div id="started-reading-e-books" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#started-reading-e-books" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m a die-hard fan of paper books. I love the feeling of finishing an entire book and placing it on the shelf to collect — that &amp;ldquo;this whole bookshelf is my knowledge&amp;rdquo; feeling. I wasn&amp;rsquo;t really into e-books; they give a &amp;ldquo;read it and it&amp;rsquo;s gone&amp;rdquo; vibe. Three reasons brought me back to e-books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;I recently deleted all my go-to time-killing apps and needed an app that wasn&amp;rsquo;t so brain-numbing — something I could open first when pulling out my phone.&lt;/li&gt;
&lt;li&gt;E-books are just more convenient than paper books; you can pull them out and read anytime.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Making use of fragmented subway commute time. Back when I was preparing for graduate school entrance exams, I made a detailed daily schedule that included subway time. Since I&amp;rsquo;d already built the habit of studying on the subway, I didn&amp;rsquo;t want to give it up. I recommend my summary of the graduate exam experience: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/125101488?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;How I Got Into Wuhan University&amp;rsquo;s Part-Time Graduate Program&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I slightly adjusted my old plan — no need to grind vocabulary as intensely anymore — so I swapped in e-book reading. I split subway time into two blocks: morning commute and evening commute.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f3d9749383d1.png" alt="abc" /&gt;&lt;/p&gt;
&lt;p&gt;In the morning, my mind is clear and my mental state is good, so I read technical e-books. These require slow reading, sometimes stopping to think. This is goal-driven reading.&lt;/p&gt;
&lt;p&gt;In the evening, my mind is foggy (not really foggy, more often it&amp;rsquo;s a headache), so I read lighter books — like &amp;ldquo;extracurricular&amp;rdquo; books such as &lt;em&gt;Rich Dad Poor Dad&lt;/em&gt;. These books aren&amp;rsquo;t hardcore in content, so I read fast and enjoyably, with a bit of a dopamine-driven reading feel.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Poor Dad and Rich Dad
 &lt;div id="poor-dad-and-rich-dad" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#poor-dad-and-rich-dad" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Author Robert Kiyosaki grew up in Hawaii. His biological father was a highly educated government education official — the &amp;ldquo;Poor Dad.&amp;rdquo; His best friend&amp;rsquo;s father was a high school dropout with extraordinary financial intelligence — the &amp;ldquo;Rich Dad.&amp;rdquo; Poor Dad had higher education but worried daily about loans and bills, while Rich Dad spent every day directing people to create wealth for him. The book has a classic line: &amp;ldquo;The poor work for money; the rich make money work for them.&amp;rdquo;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The mindset of making money&lt;/strong&gt;: As a child, the author and his good friend came up with many ways to earn money. They once used toothpaste tubes to counterfeit coins — not knowing at the time that making money was illegal — and were stopped by adults. Later, they gathered free books from stores, set up a little library in a spot, and earned money by renting books to neighborhood kids. They stopped after attracting some local unsavory characters. Rich Dad admired their money-making behavior. He believed the difference between the rich and the poor is that the rich are always thinking about how to make money, while the poor are only thinking about how to find a good job.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On taxes&lt;/strong&gt;: The poor pay far more in taxes than the rich. Rich Dad paid more taxes than Poor Dad, but Rich Dad&amp;rsquo;s income was vastly higher. When the US president decided to raise taxes on the rich, they only raised taxes on the salaried middle class — the truly rich were unaffected. The rich have many ways to legally avoid taxes, by understanding how to use the law. For example, the author says in real estate: if you sell a house, the income is heavily taxed, but if you swap houses, there&amp;rsquo;s no tax. The rich can use this statute to invest in real estate and legally avoid taxes. But the poor can&amp;rsquo;t escape income tax — the more you earn, the more tax you pay.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On investing&lt;/strong&gt;: Investing requires cultivating knowledge in accounting, finance, and law. In short, if you want to make money, you need to develop your financial intelligence. When you earn a big sum, you should start the next investment rather than buying consumer goods. Think of the tables, chairs, jars, bottles, clothes, and household items in our homes — we pay a relatively high price for them, but the moment they&amp;rsquo;re bought, their value drops to near zero. This isn&amp;rsquo;t to say don&amp;rsquo;t buy things — but consider investing your money first, then consider zero-return consumption.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Finally
 &lt;div id="finally" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#finally" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The book says our education system cultivates people&amp;rsquo;s ability to work, not their ability to make money. I strongly agree with this statement, but I still believe in the power of education. The author isn&amp;rsquo;t telling people to skip education — education is also very important for making money. We need to understand the basic operating principles and rules of this world, and that can help us find suitable ways to make money.&lt;/p&gt;
&lt;p&gt;Given the author&amp;rsquo;s family background at the time, they may have been relatively poor compared to Rich Dad, but for truly poor people, their family conditions were far from poor. I feel my own circumstances haven&amp;rsquo;t yet reached the point where I can fully devote myself to investment and money-making. If one day I have some spare money and my management, social, and decision-making skills reach a certain level, I might look for ways to make money. For now, I can&amp;rsquo;t think that far — gotta code well and fill the holes first.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a5205d91518d.png" alt="ass" /&gt;&lt;/p&gt;
&lt;p&gt;The Learning Pyramid in the book left a deep impression on me. You really retain very little from passive learning. That&amp;rsquo;s why I persist in writing frequently, including book notes like these. Problems I encountered during years of late-night database maintenance — I still remember them vividly. Hands-on experience truly creates the deepest memories. That said, hands-on experience is unpredictable and rare; reading and self-learning are the lowest-cost, easiest-to-form habits, and the most cost-effective way to improve ability. They&amp;rsquo;re not &lt;em&gt;that&lt;/em&gt; &amp;ldquo;passive.&amp;rdquo; The Learning Pyramid&amp;rsquo;s &amp;ldquo;passive&amp;rdquo; refers to knowledge being received by the subject; &amp;ldquo;active&amp;rdquo; refers to the subject outputting knowledge. This also strengthens my motivation to record and share — whether technical or non-technical.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Sapiens: A Brief History of Humankind</title><link>https://lastdba.com/en/2024/08/13/book-notes-sapiens-a-brief-history-of-humankind/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/book-notes-sapiens-a-brief-history-of-humankind/</guid><description>&lt;p&gt;This is a book I spent a long time reading. It&amp;rsquo;s thick, covers an enormous range of topics, and tackling the original English edition was challenging. But thankfully, I finally finished it — today (February 2023). A real sense of accomplishment.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Sapiens: A Brief History of Humankind&lt;/em&gt; is a grand history book that comprehensively introduces the development of human civilization. I&amp;rsquo;ve always enjoyed learning about human history, immersing myself in its weight and the vitality of civilizational progress.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Cognitive Revolution and Fiction
 &lt;div id="the-cognitive-revolution-and-fiction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-cognitive-revolution-and-fiction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Conventional views of human history hold that humanity&amp;rsquo;s first major evolution or revolution was learning to use tools. Like in &lt;em&gt;2001: A Space Odyssey&lt;/em&gt;, where apes bang bones together as the iconic BGM plays — but that&amp;rsquo;s science fiction. &lt;em&gt;Sapiens&lt;/em&gt; argues that humanity&amp;rsquo;s first major revolution was the Cognitive Revolution, the key distinction between humans and animals. Learning to walk upright didn&amp;rsquo;t just free our hands — more importantly, it freed our minds. Four-legged running animals never evolved the way we did because harsh natural environments demanded stronger bodies and limbs for speed. Walking upright obviously makes you slower, so group living and tool use compensated. But group living and tools aren&amp;rsquo;t unique to Sapiens — many animals live in groups, and chimpanzees use tools too. What set Sapiens apart was learning to manufacture weapons, boats, and sustain much larger social groups. They walked from Africa to the Middle East, to Europe, battling the physically stronger Neanderthals and ultimately taking their territory. They reached the Far East, crossed the Bering Strait into the Americas, and even sailed to Australia. This ability to craft complex tools and communicate at unprecedented levels — that&amp;rsquo;s what the Cognitive Revolution brought.&lt;/p&gt;</description><content:encoded>&lt;p&gt;This is a book I spent a long time reading. It&amp;rsquo;s thick, covers an enormous range of topics, and tackling the original English edition was challenging. But thankfully, I finally finished it — today (February 2023). A real sense of accomplishment.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Sapiens: A Brief History of Humankind&lt;/em&gt; is a grand history book that comprehensively introduces the development of human civilization. I&amp;rsquo;ve always enjoyed learning about human history, immersing myself in its weight and the vitality of civilizational progress.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Cognitive Revolution and Fiction
 &lt;div id="the-cognitive-revolution-and-fiction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-cognitive-revolution-and-fiction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Conventional views of human history hold that humanity&amp;rsquo;s first major evolution or revolution was learning to use tools. Like in &lt;em&gt;2001: A Space Odyssey&lt;/em&gt;, where apes bang bones together as the iconic BGM plays — but that&amp;rsquo;s science fiction. &lt;em&gt;Sapiens&lt;/em&gt; argues that humanity&amp;rsquo;s first major revolution was the Cognitive Revolution, the key distinction between humans and animals. Learning to walk upright didn&amp;rsquo;t just free our hands — more importantly, it freed our minds. Four-legged running animals never evolved the way we did because harsh natural environments demanded stronger bodies and limbs for speed. Walking upright obviously makes you slower, so group living and tool use compensated. But group living and tools aren&amp;rsquo;t unique to Sapiens — many animals live in groups, and chimpanzees use tools too. What set Sapiens apart was learning to manufacture weapons, boats, and sustain much larger social groups. They walked from Africa to the Middle East, to Europe, battling the physically stronger Neanderthals and ultimately taking their territory. They reached the Far East, crossed the Bering Strait into the Americas, and even sailed to Australia. This ability to craft complex tools and communicate at unprecedented levels — that&amp;rsquo;s what the Cognitive Revolution brought.&lt;/p&gt;
&lt;p&gt;Neanderthals themselves have gone extinct, but recent research shows the vast majority of humans carry a small amount of Neanderthal DNA — except for indigenous Africans. This suggests Neanderthals weren&amp;rsquo;t entirely wiped out by Sapiens; a small number interbred with Sapiens and their genes spread across the world. This is also key evidence supporting the Out-of-Africa theory of human origins.&lt;/p&gt;
&lt;p&gt;The book gives a classic example of the Cognitive Revolution: imagine a lion by the river. One Sapiens sees it and tells others. The others then construct in their minds the idea that &amp;ldquo;there is a lion by the river&amp;rdquo; — even though they don&amp;rsquo;t know for certain whether one is actually there. The prerequisite is that Sapiens had to learn to conceive of things that aren&amp;rsquo;t immediately present. More importantly, once they mastered this skill, language, fiction, lies, power, social structures followed&amp;hellip; Neanderthals clearly exchanged far less information than Sapiens.&lt;/p&gt;
&lt;p&gt;The Cognitive Revolution had an enormous impact on civilizational development. It allowed the construction of things that don&amp;rsquo;t actually exist — gods, religions, power, money, social structures, dynasties&amp;hellip; Take a company, for example. A company is really a social construct; it doesn&amp;rsquo;t actually exist in the physical world. A company can be a stack of 4A paper with a stamp in a document bag — but that&amp;rsquo;s just paper. Employees believe the company exists because their minds believe it does. Everyone believes it exists, but the company itself is a fiction in human minds — the entity &amp;ldquo;company&amp;rdquo; does not exist in the real world.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Money
 &lt;div id="money" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#money" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;How did money come about? In a world without money, stable social structures gave rise to barter trade. But as the variety of traded goods increased, the number of equivalent exchange equations grew exponentially. When trading shoes for a rake, it&amp;rsquo;s a simple one-for-one swap. Add a donkey, and you have three exchange equations: shoes-rake, rake-donkey, shoes-donkey. As goods multiply, the number of exchange equations becomes a combinatorial explosion — and that&amp;rsquo;s not even accounting for multi-item trades. Then an intermediary — money — appeared and solved the problem instantly. Everything only needed to be equated with money. Money served as the universal equivalent for all goods, and the convenience of trade improved dramatically. Early forms of money were diverse, with shells being the most common. If shells were too easy to obtain, someone could buy up everything in the market, so shell-based monetary civilizations were typically inland. Since people carried money in their pockets to buy things or hoarded it at home, and worried that too-easy acquisition of currency would disrupt markets, gold — rare, resistant to decay, difficult to mine — became humanity&amp;rsquo;s primary currency for long periods. In ancient Europe, many kings minted gold coins bearing their portraits or logos, resulting in a vast variety of European gold coins. Ancient China was somewhat different: starting with shells unearthed at Sanxingdui, then bronze coins during the Spring and Autumn and Warring States periods, then gold, silver, copper, and paper money (jiaozi) across dynasties. China didn&amp;rsquo;t stick to gold like Europe did, mainly because the population was too large and gold reserves too small, making gold too valuable — they needed other metals to create a monetary gradient to smooth trade across different scales.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s another great insight in the book: money and religion both have a certain transmissibility. Money and religion are essentially no different — they are both human constructs, fictions. Their only difference: religion tells you what &lt;em&gt;you&lt;/em&gt; should believe, while money tells you what &lt;em&gt;others&lt;/em&gt; believe.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Columbus and Zheng He
 &lt;div id="columbus-and-zheng-he" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#columbus-and-zheng-he" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The Age of Discovery, the early Industrial Revolution. Europeans were passionate about exploring the world&amp;rsquo;s unknown territories. After Europeans learned the Earth was round, lacking good surveying tools, Columbus set sail westward from Europe aiming for India. They crossed the Atlantic and reached a landmass, encountered the locals, thought they&amp;rsquo;d reached India, and called them &amp;ldquo;Indians.&amp;rdquo; To this day, &amp;ldquo;Indian&amp;rdquo; in the United States carries both meanings. Europeans realized the world still had many corners untouched (at least by relatively modern civilization). They redrew world maps, filling unknown regions with sea monsters and leviathans. These maps are still widely used in video games — for instance, &lt;em&gt;Civilization VI&lt;/em&gt; uses sea monster maps for unexplored territory, waiting to be discovered. Europeans eagerly sought new lands, and soon South America, New Zealand, Australia, and countless small islands were discovered and claimed. Where local civilizations were too far behind — the Aztec, Native American, Māori, Tasmanian civilizations — they were brutally massacred, their lands occupied by white settlers.&lt;/p&gt;
&lt;p&gt;When the Aztec civilization encountered Spaniards clad in gleaming iron armor and wielding sharp iron swords, they thought those men were gods. They couldn&amp;rsquo;t comprehend such hard clothing and weapons — they must have been sent by the gods. And then they were deceived and slaughtered by &amp;ldquo;higher civilization.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Zheng He&amp;rsquo;s ships were called &amp;ldquo;dragon boats&amp;rdquo; (the original text says this, even includes illustrations — I think it may be a mistake, or Westerners assumed any ship with a dragon figurehead was a dragon boat). They were several times larger than Columbus&amp;rsquo;s ships and set sail one to two centuries earlier. Zheng He&amp;rsquo;s fleet, with far superior technology, discovered new lands but didn&amp;rsquo;t occupy them — they traded with the locals. The book argues that Europeans were more adventurous and aggressive, thus ushering in the Age of Discovery. It seems Europeans hold a relatively friendly view of the Ming Dynasty. In &lt;em&gt;Civilization VI&lt;/em&gt;, only three Chinese leaders appear: Qin Shi Huang, Wu Zetian, and Zhu Di — and only Zhu Di of the Ming Dynasty is the &amp;ldquo;tall build&amp;rdquo; development-focused leader.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Closing
 &lt;div id="closing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#closing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Retracing the development of human civilization lets us understand where we came from, what we&amp;rsquo;re doing now, and explore where we&amp;rsquo;re headed. This love for the subject is also why I enjoy strategy games like &lt;em&gt;Civilization VI&lt;/em&gt; and &lt;em&gt;Humankind&lt;/em&gt;. When you plant rice, domesticate horses, mine salt, iron, coal, oil, uranium&amp;hellip; there&amp;rsquo;s a thrill of human progress.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d like to close with a quote from &lt;em&gt;Civilization VI&lt;/em&gt;, a game I&amp;rsquo;ve played for over 400 hours: &amp;ldquo;From the first stirrings of life beneath the water&amp;hellip; to the great beasts of the Stone Age&amp;hellip; to man taking his first upright steps, you have come far. Now begins your greatest quest.&amp;rdquo;&lt;/p&gt;</content:encoded></item><item><title>Getting Started with pg_rewind</title><link>https://lastdba.com/en/2024/08/13/getting-started-with-pg_rewind/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/getting-started-with-pg_rewind/</guid><description>&lt;h2 class="relative group"&gt;What is pg_rewind?
 &lt;div id="what-is-pg_rewind" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-pg_rewind" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;pg_rewind is a PostgreSQL-provided tool. When the timelines of two PG instances diverge, pg_rewind can synchronize them. (For example, the primary is running, the standby failover has been running for a while — at this point the primary and standby timelines have diverged.)&lt;/p&gt;
&lt;p&gt;pg_rewind compares the sizes of files between the source and target, then copies differing files from source to target, including configuration files. However, it does not compare unchanged files, so pg_rewind runs efficiently on large databases with few changes.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;What is pg_rewind?
 &lt;div id="what-is-pg_rewind" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-pg_rewind" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;pg_rewind is a PostgreSQL-provided tool. When the timelines of two PG instances diverge, pg_rewind can synchronize them. (For example, the primary is running, the standby failover has been running for a while — at this point the primary and standby timelines have diverged.)&lt;/p&gt;
&lt;p&gt;pg_rewind compares the sizes of files between the source and target, then copies differing files from source to target, including configuration files. However, it does not compare unchanged files, so pg_rewind runs efficiently on large databases with few changes.&lt;/p&gt;
&lt;p&gt;pg_rewind can be used after a standby failover: even if the standby has been running independently for some time, it can be pulled back to the same state as the primary and become a standby again.&lt;/p&gt;
&lt;p&gt;During execution, pg_rewind compares the divergence point between primary (source) and standby (target), and transmits the primary&amp;rsquo;s WAL logs after the divergence point to the standby. Therefore, if the primary&amp;rsquo;s WAL after the divergence point is also lost, rewind won&amp;rsquo;t copy nonexistent WAL logs, and the standby will still fail to become a standby. The solution is to use restore.&lt;/p&gt;
&lt;p&gt;!!! When using pg_rewind, back up the target instance. pg_rewind directly overwrites the target database&amp;rsquo;s files. If rewind fails, the target database may be unable to start.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Using pg_rewind
 &lt;div id="using-pg_rewind" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-pg_rewind" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After a primary-standby switchover, the old primary continues running, causing timeline inconsistency. The old primary cannot start as a standby for the new primary.&lt;/p&gt;
&lt;p&gt;When attempting to start the standby, a timeline error appears:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOG: entering standby mode
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FATAL: requested timeline &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; is not a child of this server&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;s history
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: Latest checkpoint is at 0/6000028 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/4000098.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOG: startup process &lt;span style="color:#f92672"&gt;(&lt;/span&gt;PID 22321&lt;span style="color:#f92672"&gt;)&lt;/span&gt; exited with exit code &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOG: aborting startup due to startup process failure
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOG: database system is shut down&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point, rewind is needed to realign the primary and standby.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure pg_hba on the current primary
Set up login permissions for the pg_rewind user to access the source database. hba changes require a database restart.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi $source/pg_hba.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;host all pg 172.17.100.150/32 trust&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;pg_rewind requires a high-privilege user. Newer PG versions allow granting privileges; older versions should use a superuser.
My environment is PG 9.6, so I use the OS superuser directly.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;wal_log_hints = on parameter configuration
Append &lt;code&gt;wal_log_hints = on&lt;/code&gt; to the target database&amp;rsquo;s postgres.conf, then start and shut down the target database once (at this point the primary is running and the standby is shut down).&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi $dest/postgres.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_log_hints &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Execute pg_rewind&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_sla&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ /pg/pg96/bin/pg_rewind --target-pgdata /pg/pg96data_pri --source-server&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;host=172.17.100.150 port=5433 user=pg password=oracle dbname=postgres&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;servers diverged at WAL position 0/4000098 on timeline &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rewinding from last common checkpoint at 0/4000028 on timeline &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Done!&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Configure standby parameters
Modify IP, port, directory, etc. in postgres.conf and recovery.conf. pg_rewind also copies configuration files over.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_pri&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ mv recovery.done recovery.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_pri&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ vi recovery.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_pri&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ vi postgres.conf&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Start the standby&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_pri&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ /pg/pg96/bin/pg_ctl -D /pg/pg96data_sla -l /pg/pg96data_sla/server.log start 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server starting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_sla&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ psql -p5433 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql &lt;span style="color:#f92672"&gt;(&lt;/span&gt;9.6.17&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# \x&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Expanded display is on.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# select * from pg_stat_replication ;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-&lt;span style="color:#f92672"&gt;[&lt;/span&gt; RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;]&lt;/span&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid | &lt;span style="color:#ae81ff"&gt;24766&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid | &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename | lzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name | walreceiver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr | 172.17.100.150
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port | &lt;span style="color:#ae81ff"&gt;47345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start | 2021-07-30 07:44:05.582546+00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state | streaming
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sent_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flush_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;replay_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_priority | &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_state | async&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Common Issues
 &lt;div id="common-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#common-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;pg_rewind Error 1
 &lt;div id="pg_rewind-error-1" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_rewind-error-1" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;could not fetch remote file &lt;span style="color:#e6db74"&gt;&amp;#34;global/pg_control&amp;#34;&lt;/span&gt;: ERROR: must be superuser to read files
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Failure, exiting&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Solution: Use a high-privilege user.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# \du&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; List of roles
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Role name | Attributes | Member of 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-------------+------------------------------------------------------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl | Replication | &lt;span style="color:#f92672"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg | Superuser, Create role, Create DB, Replication, Bypass RLS | &lt;span style="color:#f92672"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; rewind_user | | &lt;span style="color:#f92672"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;pg&lt;/code&gt; user is the built-in superuser that comes with the PG server, matching the PG installation user. The OS installation user certainly has permission to modify pg_control.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/pg/pg96/bin/pg_rewind --target-pgdata /pg/pg96data_pri --source-server&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;host=172.17.100.150 port=5433 user=pg password=oracle dbname=postgres&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;pg_rewind Error 2
 &lt;div id="pg_rewind-error-2" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_rewind-error-2" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;could not connect to server: FATAL: no pg_hba.conf entry &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; host &lt;span style="color:#e6db74"&gt;&amp;#34;172.17.100.150&amp;#34;&lt;/span&gt;, user &lt;span style="color:#e6db74"&gt;&amp;#34;rewind_user&amp;#34;&lt;/span&gt;, database &lt;span style="color:#e6db74"&gt;&amp;#34;postgres&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Failure, exiting&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;No pg_hba.conf entry configured for the connection.
Solution: Configure pg_hba for the user, e.g.:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;host all pg 172.17.100.150/32 trust&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;pg_rewind Error 3
 &lt;div id="pg_rewind-error-3" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_rewind-error-3" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_sla&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ /pg/pg96/bin/pg_rewind --target-pgdata /pg/pg96data_pri --source-server&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;host=172.17.100.150 port=5433 user=pg password=oracle dbname=postgres&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;target server needs to use either data checksums or &lt;span style="color:#e6db74"&gt;&amp;#34;wal_log_hints = on&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Root causes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;full_page_writes (enabled by default)&lt;/li&gt;
&lt;li&gt;wal_log_hints must be set to on, or PG must have checksums enabled at initdb time.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Solution: Add &lt;code&gt;wal_log_hints = on&lt;/code&gt; to the target database&amp;rsquo;s postgres.conf, then start and shut down the target database once (the target was already shut down — it must be started and shut down again for the parameter to take effect).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi postgres.conf &lt;span style="color:#75715e"&gt;# add to target database config&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_log_hints &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Restart the target database to apply:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_sla&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ /pg/pg96/bin/pg_ctl -D /pg/pg96data_pri -l /pg/pg96data_pri/server.log start 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server starting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg96data_sla&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ /pg/pg96/bin/pg_ctl -D /pg/pg96data_pri -l /pg/pg96data_pri/server.log stop
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to shut down.... &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/9.6/app-pgrewind.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/9.6/app-pgrewind.html&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>How I Got Into Wuhan University's Part-Time Master's Program</title><link>https://lastdba.com/en/2024/08/13/how-i-got-into-wuhan-universitys-part-time-masters-program/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/how-i-got-into-wuhan-universitys-part-time-masters-program/</guid><description>&lt;h2 class="relative group"&gt;Why Did I Want to Pursue a Part-Time Master&amp;rsquo;s?
 &lt;div id="why-did-i-want-to-pursue-a-part-time-masters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-did-i-want-to-pursue-a-part-time-masters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;To improve my academic credentials. My undergraduate degree is from an ordinary university. A higher degree can add a bit of competitiveness in my career.&lt;/li&gt;
&lt;li&gt;I once submitted my resume to a state-owned enterprise and was completely ghosted. But a colleague with better academic credentials in the same office got through. So for state-owned enterprises, higher education is the knock on the door.&lt;/li&gt;
&lt;li&gt;To make up for failing the graduate entrance exam as a senior and revive the dream of graduate studies.&lt;/li&gt;
&lt;li&gt;Learning is never wrong — this is my creed.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Differences Between Full-Time and Part-Time Graduate Programs
 &lt;div id="differences-between-full-time-and-part-time-graduate-programs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#differences-between-full-time-and-part-time-graduate-programs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Study Mode
 &lt;div id="study-mode" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#study-mode" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Full-time means you quit your job; part-time allows you to keep working. This basically locks in part-time as the only option for most working people.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Why Did I Want to Pursue a Part-Time Master&amp;rsquo;s?
 &lt;div id="why-did-i-want-to-pursue-a-part-time-masters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-did-i-want-to-pursue-a-part-time-masters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;To improve my academic credentials. My undergraduate degree is from an ordinary university. A higher degree can add a bit of competitiveness in my career.&lt;/li&gt;
&lt;li&gt;I once submitted my resume to a state-owned enterprise and was completely ghosted. But a colleague with better academic credentials in the same office got through. So for state-owned enterprises, higher education is the knock on the door.&lt;/li&gt;
&lt;li&gt;To make up for failing the graduate entrance exam as a senior and revive the dream of graduate studies.&lt;/li&gt;
&lt;li&gt;Learning is never wrong — this is my creed.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Differences Between Full-Time and Part-Time Graduate Programs
 &lt;div id="differences-between-full-time-and-part-time-graduate-programs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#differences-between-full-time-and-part-time-graduate-programs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Study Mode
 &lt;div id="study-mode" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#study-mode" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Full-time means you quit your job; part-time allows you to keep working. This basically locks in part-time as the only option for most working people.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Exam Scope
 &lt;div id="exam-scope" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#exam-scope" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Full-time exams are more demanding, usually covering four subjects: advanced math, graduate English, politics, and a specialized course.
Part-time exams are less demanding, covering two subjects: the Management Comprehensive Exam (middle school math, logic, writing) and graduate English.
Except for English, which is similar to the full-time version, the management comprehensive exam content is much easier than the full-time track (more on this later).&lt;/p&gt;

&lt;h3 class="relative group"&gt;Research Direction
 &lt;div id="research-direction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#research-direction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Full-time graduate programs lean toward research, emphasizing learning and research output, cultivating students&amp;rsquo; learning and research abilities.&lt;/p&gt;
&lt;p&gt;Part-time programs lean toward enhancing students&amp;rsquo; management skills, delivering management-oriented talent to society.&lt;/p&gt;
&lt;p&gt;These two directions are quite different.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Social Recognition
 &lt;div id="social-recognition" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#social-recognition" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Full-time graduate degrees certainly carry more recognition than part-time ones. After all, the bar is higher, the study pressure is greater, it&amp;rsquo;s the mainstream path, and society recognizes it more. But part-time degrees do carry recognition too — many schools have explicitly stated they treat both equally (on paper). Most importantly, part-time graduate students hold dual certificates (degree certificate and diploma).&lt;/p&gt;
&lt;p&gt;As for employment, it depends on the employer. Some positions only require any graduate degree, while others may explicitly state &amp;ldquo;full-time graduate degree required.&amp;rdquo; But for those who can&amp;rsquo;t quit their jobs to pursue higher education, part-time is practically the only path.
In summary: &lt;strong&gt;Part-time also grants dual certificates, but part-time recognition &amp;lt; full-time recognition.&lt;/strong&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;How to Choose a Major
 &lt;div id="how-to-choose-a-major" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-to-choose-a-major" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Consider your job nature, career aspirations, and your wallet.&lt;/p&gt;
&lt;p&gt;HR, finance, or corporate executive: MBA&lt;/p&gt;
&lt;p&gt;Technical roles or engineering management: MEM&lt;/p&gt;
&lt;p&gt;Civil servant or public administration: MPA&lt;/p&gt;
&lt;p&gt;Accounting: MPAcc. There are a few other niche options — search online.&lt;/p&gt;
&lt;p&gt;From a financial perspective, tuition varies by school but generally follows similar ranges. Taking Sichuan University as an example: MEM costs about 15,000 yuan per year, MBA about 150,000 yuan per year, MPA roughly similar to MEM.&lt;/p&gt;
&lt;p&gt;For an IT professional like me, with a thin wallet and not seeing myself as an executive, MEM is the better fit.&lt;/p&gt;

&lt;h2 class="relative group"&gt;How to Choose a School
 &lt;div id="how-to-choose-a-school" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-to-choose-a-school" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since the study difficulty is relatively low and social recognition isn&amp;rsquo;t as high as full-time, I recommend choosing a prestigious local university. 211 and 985 universities are strongly recommended — just pick one you like. Many 985 universities set their admission cutoff at the national line, so I personally feel that non-211/985 schools aren&amp;rsquo;t worth applying to. If the scores are the same, why not choose a better school?&lt;/p&gt;
&lt;p&gt;Of course, some 985 universities set their own cutoff lines. You&amp;rsquo;ll need to check the school&amp;rsquo;s department website for historical admission scores. For example, Sichuan University sets its own line every year, typically 20–30 points above the national line.&lt;/p&gt;

&lt;h2 class="relative group"&gt;How Does the Exam Work?
 &lt;div id="how-does-the-exam-work" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-does-the-exam-work" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Exam Content
 &lt;div id="exam-content" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#exam-content" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The exam is divided into the preliminary exam and the re-examination. The preliminary exam is in late December; the re-examination is in March.&lt;/p&gt;
&lt;p&gt;The preliminary exam is a written test. After registering for the exam and selecting a test venue, you take it in late December — finished in one day, each session 3 hours.&lt;/p&gt;
&lt;p&gt;The re-examination is an interview. A few schools add a written component, but since the pandemic, it&amp;rsquo;s all been online interviews — rarely do you need to write anything during the interview.&lt;/p&gt;
&lt;p&gt;Preliminary exam content (Management Comprehensive Exam):



&lt;img src="https://lastdba.com/img/csdn/c920095c8db9.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Re-examination content:



&lt;img src="https://lastdba.com/img/csdn/743ea585843c.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Early Interview
 &lt;div id="early-interview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#early-interview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The early interview means the school arranges an interview before the preliminary exam — effectively moving the re-examination earlier. Once you pass the early interview, you only need to reach the national line on the preliminary exam. Under the normal process, you&amp;rsquo;d need to exceed the school&amp;rsquo;s own cutoff line.&lt;/p&gt;
&lt;p&gt;Early interviews are only offered by some schools. For example, Tsinghua has an early-admission interview; Sichuan University doesn&amp;rsquo;t. You&amp;rsquo;ll need to check the official website of your target school.&lt;/p&gt;
&lt;p&gt;If you pass the early interview, the pressure on the preliminary exam is indeed lighter.&lt;/p&gt;

&lt;h2 class="relative group"&gt;How to Register
 &lt;div id="how-to-register" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-to-register" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The two most important websites in the graduate exam process are your target school&amp;rsquo;s official website and the China Graduate Admission Website (研招网, YZW).



&lt;img src="https://lastdba.com/img/csdn/85579cc5cd7e.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Before registering, check the master&amp;rsquo;s program catalog for your target school and major. For example, part-time engineering management should be selected as follows:



&lt;img src="https://lastdba.com/img/csdn/3a5abec906cf.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Should You Sign Up for a Training Course?
 &lt;div id="should-you-sign-up-for-a-training-course" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#should-you-sign-up-for-a-training-course" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Many people wonder whether to sign up for a training course. Signing up feels too expensive — what if you don&amp;rsquo;t pass? Not signing up means you don&amp;rsquo;t know how to study, or studying feels too exhausting.&lt;/p&gt;
&lt;p&gt;I have some authority on this question, because I did sign up for one.&lt;/p&gt;
&lt;p&gt;I saw a training course online and asked about the price — 8,000 yuan. On top of that, there was an information gap: I didn&amp;rsquo;t know what the exam covered, how to study, how to register, which school to apply to, or where to search for this information. (Searching &amp;ldquo;part-time graduate&amp;rdquo; on Baidu immediately yields nothing but ads.) Plus, I was genuinely determined to study at the time. So I fell into this trap&amp;hellip;&lt;/p&gt;

&lt;h3 class="relative group"&gt;What Did the Training Course Give Me?
 &lt;div id="what-did-the-training-course-give-me" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-did-the-training-course-give-me" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;First, a pile of study materials — study methods, past exam papers, and so on. Other than the English vocabulary list, which I immediately started memorizing, I barely touched anything else. I printed the past exam papers too, but I never looked at them — not even by the time the exam was over. They looked useful, but in reality, you can search past papers on Taobao and find plenty of officially published versions with detailed explanations — far more useful and less straining on the eyes. And that vocabulary list — strangely, the one the course gave me didn&amp;rsquo;t match the one in Zhang Jian&amp;rsquo;s Yellow Book. I memorized the course&amp;rsquo;s vocabulary for a long time, only to find that some common exam words weren&amp;rsquo;t in the list. Later I switched to Zhang Jian&amp;rsquo;s Yellow Book vocabulary and it felt much better.&lt;/p&gt;
&lt;p&gt;Beyond study materials, the most important component was live-streamed lectures, typically 8–10 PM — two hours of teaching and ten minutes of Q&amp;amp;A.&lt;/p&gt;
&lt;p&gt;The live lectures were useful, especially logic and math. Just listening to those two subjects essentially eliminated the need to buy extra books to thoroughly study the fundamentals of math and logic — you only needed to do the post-class exercises and practice problems. I barely listened to the English classes; I mostly self-studied. Personally, I felt that listening to English lectures was very inefficient and a waste of time — better to memorize more words and do more reading exercises. I only listened to the last two sessions of English writing, which were extremely useful. More on English writing later (with practical tips). Finally, don&amp;rsquo;t fantasize about asking the teacher questions — these online classes have many students, and the Q&amp;amp;A time is only about ten minutes. My questions were never picked.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Benefits of a training course:&lt;/strong&gt;
Convenience of learning. From a working person&amp;rsquo;s perspective, you&amp;rsquo;re already working overtime a lot. Coming home exhausted, expecting yourself to spread out materials and study like it&amp;rsquo;s the gaokao — too hard. But if it&amp;rsquo;s a lecture, you just sit on the sofa and watch the livestream. That&amp;rsquo;s much easier.
Saves time. No need to laboriously make a study plan and constantly adjust it. Listening to lectures is also easier than reading through a thick textbook on your own. Essentially, a training course is trading money for time.&lt;/p&gt;
&lt;p&gt;The good learning state of classmates motivates you. You&amp;rsquo;re not studying alone with no idea how others are doing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The pitfalls of training courses:&lt;/strong&gt;
Quality varies widely. The institution I signed up with was Shangde. I didn&amp;rsquo;t research them beforehand — they were quite mediocre. Some of their programs are contract-based: they refund if you don&amp;rsquo;t pass, but there were traps in the contract, and no one got refunds. Our group had many people fighting for refunds. Also, personal information leaks — basically every student received refund scam calls. Even I, who passed, got five or six such calls.&lt;/p&gt;
&lt;p&gt;Teacher quality is uneven. Some teachers were excellent; others seemed like they were just coasting. Some explanations were outright misleading. In my class, the math and logic teachers were especially good, English was garbage, and writing was misleading&amp;hellip;&lt;/p&gt;
&lt;p&gt;Don&amp;rsquo;t fantasize that a training course will train you into a great candidate. The course is only an aid — it mainly depends on you. From the day I started planning for the exam until the preliminary exam was over, I had basically zero weekends — every one was spent in the library or a café. I declined every social gathering.&lt;/p&gt;
&lt;p&gt;So, should you sign up for a training course?&lt;/p&gt;
&lt;p&gt;I think if you meet all of the following conditions, you can consider it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enough resolve. Since you&amp;rsquo;ve paid, don&amp;rsquo;t let it go to waste. I also don&amp;rsquo;t recommend refund-based programs that give you an escape route.&lt;/li&gt;
&lt;li&gt;Enough money. Online courses start at a few thousand yuan — my 8,000 yuan can serve as a reference. In-person courses are more expensive but offer face-to-face tutoring.&lt;/li&gt;
&lt;li&gt;Unable to bridge the information gap. The information gap may prevent you from planning your own study schedule. If the information gap is what drives you toward a course, I suggest looking at others&amp;rsquo; study plans and successful Bilibili uploaders&amp;rsquo; cases. The biggest source of information is always the school&amp;rsquo;s official website.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have the time, energy, insufficient funds, or decent learning ability, you absolutely don&amp;rsquo;t need to sign up. In that case, making a study plan that suits you is especially important.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Preliminary Exam
 &lt;div id="the-preliminary-exam" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-preliminary-exam" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Before the preliminary exam is over, focus only on preparing for the preliminary exam.&lt;/strong&gt; Generally speaking, you can prepare for the re-examination content after the preliminary exam is done.
Preparing for the preliminary exam is the core of your studies, the most energy-consuming and competitive phase — this is where success is decided.&lt;/p&gt;

&lt;h3 class="relative group"&gt;How to Prepare for the Preliminary Exam — Study Plan
 &lt;div id="how-to-prepare-for-the-preliminary-exam--study-plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-to-prepare-for-the-preliminary-exam--study-plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To prepare for the preliminary exam, you need a study plan that fits you, and you need to throw yourself into it completely.&lt;/p&gt;
&lt;p&gt;The study plan is extremely, extremely, extremely important. You need to first examine yourself — what are your strengths and weaknesses, your circumstances, which subjects you&amp;rsquo;re unfamiliar with, and which ones need long-term study.&lt;/p&gt;
&lt;p&gt;Everyone&amp;rsquo;s situation is different. Let me first share my study plan — you can reference my approach to customizing a plan and my study methods. Because the pressure isn&amp;rsquo;t as high (compared to full-time), I strongly recommend starting in July or August. Starting too late means not enough time; starting too early makes it easy to slack off. Total study time should be 5–6 months. But if your English is really poor, start memorizing words a few months earlier.&lt;/p&gt;
&lt;p&gt;My personal conditions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Not enough time — often worked overtime until 9 PM, weekends generally off. Commute by subway, two hours total both ways.&lt;/li&gt;
&lt;li&gt;Math almost completely forgotten, never touched logic before, Chinese writing has been terrible since childhood, decent English vocabulary, reading comprehension fine, English writing completely unable.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Given the study pressure and my personal conditions, my plan needed to be:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Memorize words. English is definitely the most time-consuming — it requires sustained, long-term vocabulary memorization. Before all other studying, memorize English II vocabulary. Since morning memory retention is best, I memorized words on the subway to work every day and also on weekend mornings. From August until the preliminary exam.&lt;/li&gt;
&lt;li&gt;English reading. Actually, once you&amp;rsquo;ve memorized the words, reading comprehension is easy. English II doesn&amp;rsquo;t have many long, complex sentences — if you know all the words, reading is no problem. But I personally enjoy English reading, so I scheduled daily reading of English originals. I can&amp;rsquo;t say it had a huge impact, but it wasn&amp;rsquo;t useless — consider it a supplement to exam prep. Importantly, hobbies make habits easier to form.&lt;/li&gt;
&lt;li&gt;Math and logic have similar study difficulty. Even though I was completely clueless, they were relatively easy to learn (the concepts are simple; the exam itself is another story — but more on that later). I studied math or logic from 8 PM to 10 PM on weekday evenings (mostly attending lectures — if you don&amp;rsquo;t have lectures, buy materials and self-study). This is also long-term study: early phase learning concepts, late phase practicing. Since I couldn&amp;rsquo;t always leave work on time, sometimes I had to use the evening commute and the one-hour lunch break to complete the daily math and logic tasks. (Never fall behind — one day of delay leads to a huge backlog.)&lt;/li&gt;
&lt;li&gt;Chinese writing. Prepare about one month before the exam — late November or early December. Look at writing materials and try writing yourself. Don&amp;rsquo;t aim for perfection — the main thing is to express the core idea clearly. Trust me, during the exam you&amp;rsquo;ll absolutely be writing in frantic cursive.&lt;/li&gt;
&lt;li&gt;English writing. Prepare about one month before the exam. Remember: absolutely, absolutely do not memorize model essays. Not only is it brutally hard to memorize them, they&amp;rsquo;re nearly impossible to adapt. Memorize 2–3 templates before the exam, then practice with past exam topics using the templates. You only need to swap in words — no situation where you pick up the pen and can&amp;rsquo;t write a single word.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So, my weekly plan:



&lt;img src="https://lastdba.com/img/csdn/8b233995a8b3.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;This arrangement felt quite suitable for me — it fully utilized fragmented time and made good use of weekends. The first 3–4 months build the foundation: English&amp;rsquo;s foundation is vocabulary, logic and math&amp;rsquo;s foundation is concepts. Weekly study on workdays could be consolidated and practiced on weekends. The final 1–1.5 months are mainly for writing and getting the feel of past papers.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Recommended Study Materials
 &lt;div id="recommended-study-materials" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#recommended-study-materials" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;English: Zhang Jian&amp;rsquo;s Yellow Book. Just buy the vocabulary book and past exam papers. Baicizhan (vocabulary app), iReading, WeChat public account: 考研英语外刊 (Graduate English Foreign Journals).&lt;/li&gt;
&lt;li&gt;English writing: I don&amp;rsquo;t recommend any purchasable writing guide. Use universal templates; don&amp;rsquo;t memorize model essays.&lt;/li&gt;
&lt;li&gt;Math: Chen Jian&amp;rsquo;s Math High Score Guide, past exam answer keys.&lt;/li&gt;
&lt;li&gt;Logic: Zhonggong&amp;rsquo;s Logic Easy Pass, past exam answer keys.&lt;/li&gt;
&lt;li&gt;Management comprehensive writing: Buy a popular one — they&amp;rsquo;re all not great. No need to master writing too deeply.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Don&amp;rsquo;t buy practice problem books — buy past papers directly.&lt;/strong&gt; The quality of existing practice problems doesn&amp;rsquo;t compare at all to past papers. For English, no need to buy practice books — directly buy past papers. For math and logic, beyond the built-in exercises that come with foundational study, don&amp;rsquo;t buy extra practice problem books. I did math practice problems for a short period — very time-consuming and ineffective. The key for math and logic is to solidify the fundamentals, cover all the concepts, then do past papers and review the answer explanations. In short, immersive study time (like weekends) should only be spent on past papers. Do the last 20 years&amp;rsquo; worth of papers, then cycle through them again. (20 sets of past papers — only 2 per weekend — takes over two months to finish one round; by the time you redo them, you&amp;rsquo;ve largely forgotten the earlier ones.) &lt;strong&gt;Always save the most recent two years&amp;rsquo; papers untouched — use them for timed self-testing two weeks before the exam.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;English Study
 &lt;div id="english-study" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#english-study" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Vocabulary
 &lt;div id="vocabulary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vocabulary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Morning vocabulary memorization, rain or shine (it works especially well on the subway&amp;hellip;).
Baicizhan — some people like using it. I used it early on too, but I found it ineffective. It covers the entire vocabulary pool; after a year you may not even complete one full cycle, and you&amp;rsquo;ve long forgotten what you studied earlier. So I stopped using it later.
I strongly recommend my personal vocabulary method.&lt;/p&gt;
&lt;p&gt;Everyone&amp;rsquo;s vocabulary is different. At the start, you must go through all the words once (graduate exam vocabulary is about 5,000 words) and pull out the ones you don&amp;rsquo;t know onto a vocabulary list. Since carrying a vocabulary notebook on the subway is slightly awkward, I put it on my phone.&lt;/p&gt;
&lt;p&gt;I have a dedicated vocabulary photo album:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/63e6fcbbd3e9.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;When memorizing, open it, zoom in — effectively covering the definitions while memorizing.



&lt;img src="https://lastdba.com/img/csdn/63e6b5373162.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Cycle through like this. At first I did 2 pages a day, advancing 1 page a day. Later, 4 pages a day, advancing 4 pages a day. No matter what, cycle through — memorize until you can cover the definition and know the word&amp;rsquo;s meaning. For words easily confused, add them to the list, take another photo, and update the album.&lt;/p&gt;
&lt;p&gt;Before my preliminary exam, I had cycled through these words six or seven times. Basically, aside from beyond-syllabus words, there was nothing I didn&amp;rsquo;t know.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Reading
 &lt;div id="reading" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reading" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;English total score: 100. Reading ability components = Cloze 10 points + Reading Comprehension 40 points + New Question Type Reading 10 points = 60 points. No matter how poor your English ability, reading comprehension cannot be weak. My reading ability mostly came from daily foreign journal reading, such as iReading and 考研英语外刊. About 20 minutes a day, light study pressure. (Actually, it&amp;rsquo;s mainly vocabulary — if you know the words, sentences are easy to understand.)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;iReading: &amp;ldquo;Love the World&amp;rdquo; foreign journals, one passage a day. Under Reading Plan → More Collections → Foreign Journals → Love the World, subscribe to the monthly issues, one a day. Relatively easy, good for early-stage reading improvement.&lt;/li&gt;
&lt;li&gt;WeChat public account: 考研英语外刊, one passage a day. Updated daily. This account is very well done, highly recommended. Just harder — good for later-stage challenge. If you don&amp;rsquo;t fully understand, that&amp;rsquo;s fine; I sometimes couldn&amp;rsquo;t fully grasp it either, since the difficulty is a bit high.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Read along with morning vocabulary memorization. If short on time, you can also read on the way home.&lt;/p&gt;
&lt;p&gt;Long and complex sentences: English II doesn&amp;rsquo;t have many. Some people dedicate time specifically to studying them. If you want to study long sentences specifically, I especially recommend Liu Xiaoyan&amp;rsquo;s Long Sentences video series (just search on video sites — they&amp;rsquo;re all free). It&amp;rsquo;s very engaging and well-organized, easy to stick with. As for me, I only watched the simple sentence part of Liu Xiaoyan&amp;rsquo;s course and stopped. Because, first, I found that as long as you know the words, you basically understand the sentences; second, the videos are too long and numerous, taking up too much study time.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Writing
 &lt;div id="writing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#writing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Writing is divided into short composition (letter or notice, 10 points) and long composition (data analysis essay — bar chart/pie chart analysis, 15 points).&lt;/p&gt;
&lt;p&gt;Again, do NOT memorize model essays. Before the exam I bought a writing book and memorized 10 model essays — truly, truly excruciating to memorize, and impossible to adapt. After memorizing the model essays, the first time I attempted an English writing question, I couldn&amp;rsquo;t write a single word — no exaggeration.&lt;/p&gt;
&lt;p&gt;The most valuable thing in my training course was the English templates. Using the templates, I worked through all the past years&amp;rsquo; English writing topics — every single one could be adapted. The number of words to swap in doesn&amp;rsquo;t exceed twenty; you just need to be able to write simple sentences. Here are the templates:&lt;/p&gt;
&lt;p&gt;Short composition template — Letter:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Dear Sir or Madam,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I am an undergraduate who majos in Applied English in this/a university.I am writing this letter &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; the purpose of doing sth.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1.It,first an formost,is my idea that not only ... but also
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2.Then more importantly,so ... that...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 3.The last on I must point out is that 简单句,which could be accepted by the majority of 人/.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; So It is the very moment &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; me to &lt;span style="color:#66d9ef"&gt;do&lt;/span&gt; ...,And I am looking forward to your reply.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; yours truly,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xxx.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Where &amp;ldquo;doing sth&amp;rdquo; includes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1.感谢信:expressing my genuine gratitude &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; your kind help
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2.建议信:making some suggestions concerning sth.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;3.投诉信:making my complaints concerning sth.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;4.祝贺信:show my sincere congratulations to you because 句子
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5.道歉信:offer my sincere apology to you because 句子
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;6.邀请信:invite you to participate in 活动 on behalf of 某人/组织
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7.通知信:have 某人 informed that 句子&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The letter template works for all types of letters.&lt;/p&gt;
&lt;p&gt;Besides letters, the short composition may — with low probability — test notices. The notice format differs from letters.&lt;/p&gt;
&lt;p&gt;Short composition template — Notice:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Notice
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; In an effort to &lt;span style="color:#66d9ef"&gt;do&lt;/span&gt; sth,I woud like to offer you some detailed information about it.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; The 活动 will be held in the school auditorium at &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; p.m.,next Saturday,December 28th and the requirements &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; sth. are listed as follows.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 主段内容同书信...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; If you have any questions,please feel free to send on email to
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;studentsunion@123.com or call 1234567.We are looking forward to your participation.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The long composition essentially only involves analyzing bar charts and pie charts. The data falls into two categories: comparing magnitudes and comparing trends. Only the first paragraph differs between the two; the latter two paragraphs are the same.&lt;/p&gt;
&lt;p&gt;Long composition template:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;(&lt;/span&gt;比大小首段&lt;span style="color:#f92672"&gt;)&lt;/span&gt;The diagram clearly shows/illustrates/d that 句子/词组&lt;span style="color:#f92672"&gt;(&lt;/span&gt;the purposes of/attitudes toward/the proportions of&lt;span style="color:#f92672"&gt;)&lt;/span&gt; among participants/respondents in a certain college.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Based on the data offered,one can distinctly see that 对象1 ranks the first/highest among all the categories,accounting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; 数据1.Next are 对象2 and 对象3 with 数据2 and
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;数据3 respectively ,while 对象4 only constitutes 数据4.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;(&lt;/span&gt;比趋势首段&lt;span style="color:#f92672"&gt;)&lt;/span&gt;The diagram clearly illustrates how 话题 changed during the past several years.Based on the data provided,one can distinctly see that the number of 对象1 rose/fell significantly/slightly/gradually from 数据 in 年 to 数据 in 年,while that the number of 对象2 experienced a gradual/significant increase/decrease during the same period,reaching 数据 in 年.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; From my standpoint,there are two fundamental factors that are responsible &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; this scence.To begin with,the first contributing factor is that 句子.In addition,another important factor that cannot be ignored is that 句子.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; In view of the analysis above,we can conclude that it is of little surprise to see this phenomenon in the current era.Therefore,it can be predicted that 名词词组/动词ing will still take up a large share in the future.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Writing without templates — relying on your own ability — is extremely difficult. Getting an ultra-high score with templates is hard, but getting 70–80% of the score is no problem, and the upfront investment is basically zero. After memorizing the templates, just write through all the past years&amp;rsquo; writing topics once.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Math and Logic Study
 &lt;div id="math-and-logic-study" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#math-and-logic-study" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Math questions:
 &lt;div id="math-questions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#math-questions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e16c5ab94d30.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Logic questions:
 &lt;div id="logic-questions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logic-questions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bc084fe7654f.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Math and logic are both multiple choice — nothing special to say. Early phase: learn concepts. Later phase: improve speed.&lt;/p&gt;
&lt;p&gt;Math and logic have a huge number of concepts. The early study phase takes 3–4 months, 2–3 hours a day, to learn all the concepts. After mastering the concepts, practice with past papers and review answer explanations. In the final month, practice with a stopwatch to improve speed: math questions within 70 minutes, logic within 60 minutes.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Chinese Writing Study
 &lt;div id="chinese-writing-study" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#chinese-writing-study" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The management comprehensive essay is divided into Argument Validity Analysis and Argumentative Essay.&lt;/p&gt;
&lt;p&gt;Argument Validity Analysis is essentially nitpicking — get a writing guide and look through it; it&amp;rsquo;s not hard. You need to find the logical flaws in a lengthy passage of material. When writing, find four problem points. If you haven&amp;rsquo;t studied it, you might struggle to find them; after studying, finding four points is fairly easy. Don&amp;rsquo;t worry about naming the flaws precisely — overgeneralization, equivocation, false dichotomy, etc. Just write &amp;ldquo;xxx does not lead to xxx.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The Argumentative Essay mainly involves interpreting a short passage. The key is not to misinterpret the theme. Finding the theme is also challenging at first — look at more sample materials to get a feel for it; generally you can locate the theme. The standard structure is introduction-body-conclusion. I recommend using &amp;ldquo;individual – enterprise – nation&amp;rdquo; as the framework (intro – individual – enterprise – nation – conclusion, 5 paragraphs total). Pick a few tried-and-tested points to plug in. Some students with strong writing skills write argumentative essays using other approaches — I certainly admire that. But writing time is extremely limited. Unless you&amp;rsquo;re naturally gifted with lightning-fast thinking, I recommend using a formulaic approach. &lt;strong&gt;Finishing the essay is the top priority.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Exam Time Strategy
 &lt;div id="exam-time-strategy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#exam-time-strategy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Yes, you need to strategize the exam timing too. Trust me 100% — you will not finish the management comprehensive exam. It&amp;rsquo;s the most time-crunched exam I&amp;rsquo;ve ever taken. You know you can solve the problems, but you have no time to compute.&lt;/p&gt;
&lt;p&gt;On exam day: morning — management comprehensive, 3 hours. Afternoon — English, 3 hours.&lt;/p&gt;
&lt;p&gt;English: 3 hours, relatively little content, no need for repeated recalculation — time is completely sufficient. When I finished, I had 50 minutes left and left early.&lt;/p&gt;
&lt;p&gt;Management comprehensive: 3 hours, absolutely not enough. In my pre-exam self-timed simulations, I consistently took 4 hours. During the real exam, for math — any question over 3 minutes, skip immediately. If it feels computationally heavy, skip immediately. For logic — absolutely cannot use your usual analytical approach. Speed-read the question (logic questions have colossal amounts of text), look at the options, pick whatever feels right. Logic questions requiring computation: temporarily abandon, come back later if time permits. For writing — read the prompt and start writing immediately. Write as fast as you possibly can (both essays combined no more than 1 hour). Every 2 minutes saved could rescue a multiple-choice question.&lt;/p&gt;
&lt;p&gt;You don&amp;rsquo;t have to finish all the multiple-choice (do fill in the answer sheet completely though), but you MUST finish the writing. So the question order is important. Many people do the essays first, then multiple-choice. I did math first, then essays, then logic. Either way, don&amp;rsquo;t leave writing for the end. In the real exam, both essays must be finished within 55 minutes — 1,500 words total, plus reading the prompt and brainstorming. Try it once and you&amp;rsquo;ll know how impossibly short the time is.&lt;/p&gt;
&lt;p&gt;The last 20 minutes: fill in the answer sheet. After filling it in, continue solving problems.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Re-examination
 &lt;div id="the-re-examination" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-re-examination" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Basic Information About the Re-examination
 &lt;div id="basic-information-about-the-re-examination" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#basic-information-about-the-re-examination" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If you&amp;rsquo;ve made it past the national line or the school&amp;rsquo;s own cutoff — congratulations, you&amp;rsquo;ve completed 90% of the journey. The remaining 10% is the re-examination. The re-examination has a mandatory elimination rate (required by national policy), typically around 70–80% passing rate. Since elimination must exist, some people get cut every year. If you don&amp;rsquo;t prepare, you&amp;rsquo;re very likely to be among them. Here&amp;rsquo;s a joke: I got cut from Sichuan University&amp;rsquo;s re-examination~&lt;/p&gt;
&lt;p&gt;Re-examination timeframe: mid-to-late March each year.&lt;/p&gt;
&lt;p&gt;Score release: mid-March.&lt;/p&gt;
&lt;p&gt;Content: spoken English, specialized knowledge, comprehensive interview, politics (Sichuan University: open-book politics, no need to prepare. Wuhan University: closed-book written politics&amp;hellip;)&lt;/p&gt;
&lt;p&gt;Since the pandemic, re-examinations have been online interviews — no written test environment. Experts ask questions; you answer.&lt;/p&gt;
&lt;p&gt;So you have three months to prepare for the re-examination. Conveniently, the preliminary exam ends late December, scores aren&amp;rsquo;t out yet, and February is Chinese New Year — realistically, most people start preparing only when scores are released. Take me as a cautionary example: I received the re-examination notice on March 21, the re-examination was on March 27 — I had six days to prepare, including spoken English and engineering management, which I&amp;rsquo;d never touched before&amp;hellip; So it was embarrassing: I couldn&amp;rsquo;t answer a single one of the examiner&amp;rsquo;s English questions, couldn&amp;rsquo;t answer a single specialized question. Cut from the re-examination.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Post-Adjustment (Tiaoji)
 &lt;div id="post-adjustment-tiaoji" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#post-adjustment-tiaoji" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When I found out I&amp;rsquo;d failed Sichuan University&amp;rsquo;s re-examination, my mood plummeted. But — every cloud has a silver lining. The adjustment (tiaoji) process was my lifesaver. While searching for adjustment schools, I found Wuhan University.&lt;/p&gt;
&lt;p&gt;The China Graduate Admission Website has a dedicated adjustment window, giving students who failed their initial re-examination three more interview opportunities. You can fill in three preferences — three schools to apply to. Since each school has different re-examination dates and requirements, preparing for all of them is very hard. I focused mainly on preparing for Wuhan University&amp;rsquo;s adjustment. The adjustment, of course, also involves a re-examination — essentially, schools that haven&amp;rsquo;t filled their enrollment quotas run the process again, giving students who weren&amp;rsquo;t admitted in the first round another chance.&lt;/p&gt;
&lt;p&gt;Adjustment window: late March to early April.&lt;/p&gt;

&lt;h3 class="relative group"&gt;How to Prepare for the Re-examination?
 &lt;div id="how-to-prepare-for-the-re-examination" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-to-prepare-for-the-re-examination" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The re-examination is also highly competitive. Lazy people like me are not uncommon&amp;hellip; But no matter what, you&amp;rsquo;ve already invested over half a year — you can&amp;rsquo;t let it go down the drain. (I almost did&amp;hellip;) For non-specialist students like me, the hardest parts of the re-examination are spoken English and specialized knowledge. From score release to the re-examination, you have about one week (while still working!), so learning from scratch is impossible. Based on my experience, the following approaches, in descending order of importance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Find seniors who&amp;rsquo;ve been through it and get past re-examination materials (discreetly — sharing re-examination materials externally is prohibited) and course materials. See if anyone you know is at that school, or find groups on forums or Tieba.&lt;/li&gt;
&lt;li&gt;Search Bilibili for common graduate re-examination questions. Summarize them and memorize.&lt;/li&gt;
&lt;li&gt;Buy the school&amp;rsquo;s recommended reference books (usually course materials). They&amp;rsquo;re thick; you won&amp;rsquo;t finish them.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Finally, the most important thing: mock re-examination. Summarize potential English questions, specialized questions, and comprehensive interview questions, then find a partner to act as the examiner for a mock interview.&lt;/p&gt;
&lt;p&gt;There are other re-examination requirements — keep an eye on department updates and your email: score weightings, interview process, dual-camera setup, interview schedule, document preparation, etc.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The End
 &lt;div id="the-end" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-end" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The 2022 national preliminary exam line was 185. My preliminary score was 210 (English 80, Management Comprehensive 130). Here&amp;rsquo;s my re-examination acceptance notice ^_^



&lt;img src="https://lastdba.com/img/csdn/afd22229a49a.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Good luck to all working-student-warriors battered by society but still holding onto your dreams — may your graduate exam go smoothly. You&amp;rsquo;ve got this!!!&lt;/p&gt;</content:encoded></item><item><title>OGG Oracle-to-PostgreSQL Sync — Hands-On Steps</title><link>https://lastdba.com/en/2024/08/13/ogg-oracle-to-postgresql-sync-hands-on-steps/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/ogg-oracle-to-postgresql-sync-hands-on-steps/</guid><description>&lt;p&gt;Source DB: Oracle (11.2.0.4) 192.168.10.141
Target DB: PGSQL (10.12) 192.168.10.128
OGG software version: (19.1.0.0.4)
OGG download: Oracle GoldenGate Downloads
glibc issue handling: &lt;a href="https://www.cnblogs.com/hxlasky/p/16779047.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/hxlasky/p/16779047.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;1. Install OGG Software on Source and Target
 &lt;div id="1-install-ogg-software-on-source-and-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-install-ogg-software-on-source-and-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Source:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A. Configure response file: oggcore.rsp&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle.install.responseFileVersion=/home/oracle/oggcore.rsp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INSTALL_OPTION=ORA11g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SOFTWARE_LOCATION=/oracle/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;START_MANAGER=false
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER_PORT=7809
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATABASE_LOCATION=/oracle/db/11.2.0.4
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INVENTORY_LOCATION=/oracle/oraInventory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;UNIX_GROUP_NAME=oinstall&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;B. Silent install OGG&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./runInstaller -silent -nowait -responseFile /home/oracle/oggcore.rsp&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle@szgtsp431-or@ecsdb&amp;gt;./runInstaller -silent -nowait -responseFile /home/oracle/oggcore.rsp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Starting Oracle Universal Installer...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Checking Temp space: must be greater than 120 MB. Actual 32405 MB Passed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Checking swap space: must be greater than 150 MB. Actual 2048 MB Passed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Preparing to launch Oracle Universal Installer from /tmp/OraInstall2020-08-14_08-57-27AM. Please wait ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You can find the log of this install session at:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; /oracle/oraInventory/logs/installActions2020-08-14_08-57-27AM.log
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully Setup Software.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;The installation of Oracle GoldenGate Core was successful.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Please check &amp;#39;/oracle/oraInventory/logs/silentInstall2020-08-14_08-57-27AM.log&amp;#39; for more details.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;2. Set Database to Archive Mode
 &lt;div id="2-set-database-to-archive-mode" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-set-database-to-archive-mode" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle@szgtsp431-or@ecsdb&amp;gt;sqlplus / as sysdba
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SQL*Plus: Release 11.2.0.4.0 Production on Fri Aug 14 09:06:34 2020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Copyright (c) 1982, 2013, Oracle. All rights reserved.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Connected to:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;With the Partitioning, OLAP, Data Mining and Real Application Testing options
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SQL&amp;gt; archive log list;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database log mode Archive Mode
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Automatic archival Enabled
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Archive destination /oracle/oradata/archivelog
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oldest online log sequence 19
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Next log sequence to archive 21
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Current log sequence 21&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;3. Enable Force Logging and Minimum Supplemental Logging
 &lt;div id="3-enable-force-logging-and-minimum-supplemental-logging" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#3-enable-force-logging-and-minimum-supplemental-logging" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;force&lt;/span&gt; logging;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; supplemental log &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; switch logfile;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Verify force logging and minimum supplemental logging enabled:&lt;/p&gt;</description><content:encoded>&lt;p&gt;Source DB: Oracle (11.2.0.4) 192.168.10.141
Target DB: PGSQL (10.12) 192.168.10.128
OGG software version: (19.1.0.0.4)
OGG download: Oracle GoldenGate Downloads
glibc issue handling: &lt;a href="https://www.cnblogs.com/hxlasky/p/16779047.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/hxlasky/p/16779047.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;1. Install OGG Software on Source and Target
 &lt;div id="1-install-ogg-software-on-source-and-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-install-ogg-software-on-source-and-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Source:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;A. Configure response file: oggcore.rsp&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle.install.responseFileVersion=/home/oracle/oggcore.rsp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INSTALL_OPTION=ORA11g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SOFTWARE_LOCATION=/oracle/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;START_MANAGER=false
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER_PORT=7809
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATABASE_LOCATION=/oracle/db/11.2.0.4
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INVENTORY_LOCATION=/oracle/oraInventory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;UNIX_GROUP_NAME=oinstall&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;B. Silent install OGG&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./runInstaller -silent -nowait -responseFile /home/oracle/oggcore.rsp&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle@szgtsp431-or@ecsdb&amp;gt;./runInstaller -silent -nowait -responseFile /home/oracle/oggcore.rsp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Starting Oracle Universal Installer...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Checking Temp space: must be greater than 120 MB. Actual 32405 MB Passed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Checking swap space: must be greater than 150 MB. Actual 2048 MB Passed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Preparing to launch Oracle Universal Installer from /tmp/OraInstall2020-08-14_08-57-27AM. Please wait ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You can find the log of this install session at:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; /oracle/oraInventory/logs/installActions2020-08-14_08-57-27AM.log
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully Setup Software.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;The installation of Oracle GoldenGate Core was successful.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Please check &amp;#39;/oracle/oraInventory/logs/silentInstall2020-08-14_08-57-27AM.log&amp;#39; for more details.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;2. Set Database to Archive Mode
 &lt;div id="2-set-database-to-archive-mode" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-set-database-to-archive-mode" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle@szgtsp431-or@ecsdb&amp;gt;sqlplus / as sysdba
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SQL*Plus: Release 11.2.0.4.0 Production on Fri Aug 14 09:06:34 2020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Copyright (c) 1982, 2013, Oracle. All rights reserved.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Connected to:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;With the Partitioning, OLAP, Data Mining and Real Application Testing options
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SQL&amp;gt; archive log list;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database log mode Archive Mode
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Automatic archival Enabled
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Archive destination /oracle/oradata/archivelog
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oldest online log sequence 19
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Next log sequence to archive 21
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Current log sequence 21&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;3. Enable Force Logging and Minimum Supplemental Logging
 &lt;div id="3-enable-force-logging-and-minimum-supplemental-logging" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#3-enable-force-logging-and-minimum-supplemental-logging" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;force&lt;/span&gt; logging;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; supplemental log &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; switch logfile;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Verify force logging and minimum supplemental logging enabled:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; force_logging,supplemental_log_data_min &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; v$database;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;4. Set enable_goldengate_replication Parameter
 &lt;div id="4-set-enable_goldengate_replication-parameter" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#4-set-enable_goldengate_replication-parameter" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_goldengate_replication&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;scope&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;both&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If RAC, all nodes must be modified:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_goldengate_replication&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;scope&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;both&lt;/span&gt; sid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;*&amp;#39;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;5. Create OGG User, Tablespace, and Grant Privileges
 &lt;div id="5-create-ogg-user-tablespace-and-grant-privileges" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#5-create-ogg-user-tablespace-and-grant-privileges" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; tablespace tbs_ogg datafile &lt;span style="color:#e6db74"&gt;&amp;#39;/oracle/oradata/datafile/tbs_ogg01.dbf&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;M;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; goldengate identified &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;123456&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; tablespace tbs_ogg &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; tablespace temp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;session&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;session&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; resource &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;connect&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;dictionary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; flashback &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; dba_clusters &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;execute&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; dbms_flashback &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; sequence &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; dba &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;lock&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;any&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; goldengate;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;6. Enable Table-Level Supplemental Logging
 &lt;div id="6-enable-table-level-supplemental-logging" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#6-enable-table-level-supplemental-logging" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To sync table data from specific schemas, enable supplemental logging on those tables.&lt;/p&gt;
&lt;p&gt;Check supplemental logging:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;, log_group_name, log_group_type,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; decode(always, &lt;span style="color:#e6db74"&gt;&amp;#39;ALWAYS&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;Unconditional&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;Conditional&amp;#39;&lt;/span&gt;) always
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; dba_log_groups
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;, log_group_name;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Enable supplemental logging during low-activity window:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle@szgtsp431-or@ecsdb&amp;gt;ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oracle GoldenGate Command Interpreter for Oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Version 19.1.0.0.4 OGGCORE_19.1.0.0.0_PLATFORMS_191017.1054_FBO
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Linux, x64, 64bit (optimized), Oracle 11g on Oct 17 2019 23:13:12
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Operating system character set identified as US-ASCII.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Copyright (C) 1995, 2019, Oracle and/or its affiliates. All rights reserved.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or) 1&amp;gt; dblogin userid goldengate,password 123456
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully logged into database.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 2&amp;gt; add trandata ecs.*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-08-14 09:13:54 INFO OGG-15132 Logging of supplemental redo data enabled for table ECS.DEPT.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-08-14 09:13:54 INFO OGG-15133 TRANDATA for scheduling columns has been added on table ECS.DEPT.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-08-14 09:13:54 INFO OGG-15135 TRANDATA for instantiation CSN has been added on table ECS.DEPT.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-08-14 09:13:54 INFO OGG-15132 Logging of supplemental redo data enabled for table ECS.INFO.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Verify all supplemental logging added:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; dba_tables &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;BGLWT&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; minus
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; dba_log_groups)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- no rows selected = all table-level supplemental logging added successfully&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;7. Configure Manager Process
 &lt;div id="7-configure-manager-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#7-configure-manager-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle@szgtsp431-or@ecsdb&amp;gt;ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or) 1&amp;gt; dblogin userid goldengate,password 123456
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully logged into database.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 2&amp;gt; create subdirs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Creating subdirectories under current directory /home/oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 3&amp;gt; edit param mgr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PORT 7809
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DYNAMICPORTLIST 7810-7980
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PURGEOLDEXTRACTS ./dirdat/*, USECHECKPOINTS, MINKEEPDAYS 3
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS 10
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LAGREPORTHOURS 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LAGINFOMINUTES 30
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LAGCRITICALMINUTES 45&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;8. Configure Extract Process
 &lt;div id="8-configure-extract-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#8-configure-extract-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 7&amp;gt; add extract extecs, tranlog, threads 1,begin now
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT added.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 8&amp;gt; add exttrail ./dirdat/lt, extract extecs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTTRAIL added.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 9&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT STOPPED EXTECS 00:00:00 00:00:38 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 10&amp;gt; edit param extecs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT extecs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SETENV (ORACLE_HOME = &amp;#34;/oracle/db/11.2.0.4&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SETENV (ORACLE_SID = &amp;#34;ecsdb&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;USERID goldengate, PASSWORD 123456
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTTRAIL ./dirdat/lt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANLOGOPTIONS EXCLUDEUSER goldengate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANLOGOPTIONS DBLOGREADER
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DBOPTIONS ALLOWUNUSEDCOLUMN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FETCHOPTIONS USESNAPSHOT, USELATESTVERSION, MISSINGROW REPORT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATOPTIONS REPORTFETCH
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WARNLONGTRANS 1h, CHECKINTERVAL 10m
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DYNAMICRESOLUTION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DISCARDFILE ./dirrpt/extecs.dsc, APPEND, MEGABYTES 1024
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DISCARDROLLOVER AT 6:00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORTROLLOVER AT 6:00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORTCOUNT EVERY 1 MINUTES, RATE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DDL INCLUDE MAPPED
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DDLOPTIONS ADDTRANDATA, REPORT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DDLOPTIONS NOCROSSRENAME, REPORT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TABLE ECS.*;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;9. Configure Pump Process
 &lt;div id="9-configure-pump-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#9-configure-pump-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 11&amp;gt; add extract deliecs, exttrailsource ./dirdat/lt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT added.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 12&amp;gt; add rmttrail ./dirdat/rt, extract deliecs, megabytes 500
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RMTTRAIL added.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 13&amp;gt; edit param deliecs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT deliecs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PASSTHRU
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DYNAMICRESOLUTION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RMTHOST 192.168.10.100, MGRPORT 7809
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RMTTRAIL ./dirdat/rt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DISCARDFILE ./dirrpt/deliecs.dsc, APPEND, MEGABYTES 1024
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DISCARDROLLOVER AT 6:00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORTCOUNT EVERY 1 MINUTES, RATE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORT AT 0:00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORT AT 1:00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORT AT 23:00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORTROLLOVER AT 00:00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATOPTIONS RESETREPORTSTATS
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TABLE ECS.*; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;10. Start Extract Process
 &lt;div id="10-start-extract-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#10-start-extract-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 20&amp;gt; start extecs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Sending START request to MANAGER ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT EXTECS starting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 21&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT STOPPED DELIECS 00:00:00 00:06:06 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT RUNNING EXTECS 00:00:00 00:00:01 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;11. Configure Target OGG Software
 &lt;div id="11-configure-target-ogg-software" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#11-configure-target-ogg-software" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;A. Upload OGG software and extract&lt;/strong&gt;
&lt;strong&gt;B. Configure OGG environment variables&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pgsql@szgtsp428-or ~]$ vi .bash_profile
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;## .bash_profile
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;## Get the aliases and functions
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;if [ -f ~/.bashrc ]; then
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; . ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fi
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;## User specific environment and startup programs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PATH=$PATH:$HOME/bin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PATH
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PGHOME=/usr/local/pgsql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PGDATA=/data/pgsql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export OGG_HOME=/data/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PATH=$PATH:$PGHOME/bin:$OGG_HOME
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LD_LIBRARY_PATH=$PGHOME/lib
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/lib:/usr/lib:/usr/local/lib:$OGG_HOME/lib
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export LD_LIBRARY_PATH
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export ODBCINI=/home/pgsql/odbc.ini
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export DD_ODBC_HOME=/data/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pgsql@szgtsp428-or ~]$ ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oracle GoldenGate Command Interpreter for PostgreSQL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Version 19.1.0.0.200714 OGGCORE_19.1.0.0.0OGGBP_PLATFORMS_200628.2141
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Linux, x64, 64bit (optimized), PostgreSQL on Jun 29 2020 03:59:15
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Operating system character set identified as UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Copyright (C) 1995, 2019, Oracle and/or its affiliates. All rights reserved.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 1&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;12. Create Database and Table on Target
 &lt;div id="12-create-database-and-table-on-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#12-create-database-and-table-on-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ecsdb=# \l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; List of databases
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Name | Owner | Encoding | Collate | Ctype | Access privileges 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------+----------+----------+-------------+-------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ecsdb | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; postgres | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; template0 | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/pgsql +
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | | | | | pgsql=CTc/pgsql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; template1 | pgsql | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/pgsql +
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; | | | | | pgsql=CTc/pgsql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(4 rows)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ecsdb=# \d
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; List of relations
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Schema | Name | Type | Owner 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--------+--------------+-------+----------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; public | student_info | table | postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(1 row)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ecsdb=# select * from student_info;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id | name | address 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----+------+---------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1 | Zhang San | Guangzhou
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2 | Li Si | Shenzhen
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 3 | Wang Wu | Shanghai
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 4 | Zhao Liu | Beijing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 5 | Sun Qi | Wuhan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 6 | A Da | Chengdu
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 7 | A Er | Nanjing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(7 rows)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;13. Configure Target Manager Process and Start
 &lt;div id="13-configure-target-manager-process-and-start" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#13-configure-target-manager-process-and-start" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pgsql@szgtsp428-or ogg]$ ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oracle GoldenGate Command Interpreter for PostgreSQL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 1&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER STOPPED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 2&amp;gt; create subdirs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Creating subdirectories under current directory /data/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 3&amp;gt; edit param mgr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;port 7809
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 4&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 5&amp;gt; start mgr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Manager started.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 7&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now start the pump process on source (deliecs):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle@szgtsp431-or@ecsdb&amp;gt;ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or) 1&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT ABENDED DELIECS 00:00:00 01:06:41 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT RUNNING EXTECS 00:00:00 00:00:07 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or) 2&amp;gt; start deliecs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Sending START request to MANAGER ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT DELIECS starting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or) 3&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT RUNNING DELIECS 00:00:00 01:06:55 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT RUNNING EXTECS 00:00:00 00:00:01 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;14. Target PostgreSQL Parameter Adjustment
 &lt;div id="14-target-postgresql-parameter-adjustment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#14-target-postgresql-parameter-adjustment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-ini" data-lang="ini"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;wal_level&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;logical #minimal, replica, or logical&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;max_replication_slots&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;10 #max number of replication slots&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;max_wal_sender&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;10 #maximum number of wal sender processes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;wal_receiver_status_interval&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;10s #optional, keep the system default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;wal_sender_timeout&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;60s #optional, keep the system default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;track_commit_timestamp&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;off #optional, keep the system default&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Restart PostgreSQL after adjusting parameters:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pgsql@szgtsp428-or pgsql]$ pg_ctl stop -D /data/pgsql/ -l /data/pgsql/logfile
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting for server to shut down.... done
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server stopped
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pgsql@szgtsp428-or pgsql]$ pg_ctl start -D /data/pgsql/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting for server to start.... done
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;15. Data Source Configuration (odbc.ini)
 &lt;div id="15-data-source-configuration-odbcini" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#15-data-source-configuration-odbcini" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-ini" data-lang="ini"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;[ODBC Data Sources]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PGDSN&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;DataDirect 10.12 PostgreSQL Wire Protocol&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;postgres&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;DataDirect 10.12 PostgreSQL Wire Protocol&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;scott&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;DataDirect 10.12 PostgreSQL Wire Protocol&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;[ODBC]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;IANAAppCodePage&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;InstallDir&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;/data/ogg&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;[PGDSN]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Driver&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;/data/ogg/lib/GGpsql25.so&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Description&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;DataDirect 10.12 PostgreSQL Wire Protocol&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;ecsdb&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HostName&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;127.0.0.1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;PortNumber&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;5432&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;LogonID&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;postgres&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;Password&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;123456&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;16. Connection Test
 &lt;div id="16-connection-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#16-connection-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pgsql@szgtsp428-or ~]$ cd /data/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pgsql@szgtsp428-or ogg]$ ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 1&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 2&amp;gt; dblogin sourcedb pgdsn userid postgres, password postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-08-14 11:35:01 INFO OGG-03036 Database character set identified as UTF-8. Locale: en_US.UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-08-14 11:35:01 INFO OGG-03037 Session character set identified as UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully logged into database.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;17. Configure and Start Replicat Process on Target
 &lt;div id="17-configure-and-start-replicat-process-on-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#17-configure-and-start-replicat-process-on-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Add checkpoint table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or) 1&amp;gt; dblogin sourcedb pgdsn userid postgres, password 123456
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully logged into database.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or as postgres@pgdsn) 2&amp;gt; add checkpointtable public.chkt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully created checkpoint table public.chkt.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Configure replicat:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or as postgres@pgdsn) 34&amp;gt; edit param repl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPLICAT repl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SOURCEDEFS ./dirdef/student_info.def
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SETENV (PGCLIENTENCODING = &amp;#34;UTF8&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SETENV (ODBCINI=&amp;#34;/home/pgsql/odbc.ini&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SETENV (NLS_LANG=&amp;#34;AMERICAN_AMERICA.AL32UTF8&amp;#34;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;targetdb pgdsn userid postgres, password 123456
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DISCARDFILE ./dirrpt/repl.dsc, purge
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MAP ecs.student_info, TARGET public.student_info;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or as postgres@pgdsn) 36&amp;gt; add replicat repl,exttrail ./dirdat/rt,checkpointtable public.chkt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPLICAT added.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or as postgres@pgdsn) 38&amp;gt; start repl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Sending START request to MANAGER ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPLICAT REPL starting
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or as postgres@pgdsn) 55&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPLICAT RUNNING REPL 00:00:00 00:00:08&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;18. Test Verification
 &lt;div id="18-test-verification" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#18-test-verification" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;First, create matching table structure on target:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; student_info (id int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;, name varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;), address varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then initialize data:&lt;/p&gt;
&lt;p&gt;Configure extinit process on source:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 17&amp;gt; edit param extinit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT extinit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;userid goldengate, PASSWORD 123456
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPORTCOUNT EVERY 30 MINUTES, RATE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DISCARDFILE ./dirrpt/extinit.dsc, APPEND, MEGABYTES 1024
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RMTHOST 192.168.10.100,MGRPORT 7809, compress
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RMTTASK replicat,GROUP replinit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TABLE ecs.student_info;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 18&amp;gt; ADD EXTRACT extinit, SOURCEISTABLE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT added.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Configure replinit process on target:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or as postgres@pgdsn) 28&amp;gt; edit param replinit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPLICAT replinit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;targetDB pgdsn, USERID postgres, PASSWORD 123456
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;discardfile ./dirrpt/replinit.dsc, PURGE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SOURCEDEFS ./dirdef/student_info.def
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Map ecs.student_info,target public.student_info;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp428-or as postgres@pgdsn) 29&amp;gt; add replicat repinit, SPECIALRUN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPLICAT added.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Start Oracle-to-PG data initialization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI (szgtsp431-or as goldengate@ecsdb) 9&amp;gt; start extinit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Sending START request to MANAGER ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EXTRACT EXTINIT starting&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Target: (view initialization row count via View report replicat)&lt;/p&gt;
&lt;p&gt;Check both sides:&lt;/p&gt;
&lt;p&gt;Source (Oracle):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SQL&amp;gt; select * from student_info;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ID NAME ADDRESS
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------- ---------- ----------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1 Zhang San Guangzhou
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2 Li Si Shenzhen
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 3 Wang Wu Shanghai
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 4 Zhao Liu Beijing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 5 Sun Qi Wuhan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 6 A Da Chengdu
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 7 A Er Nanjing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 8 A San Beijing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;8 rows selected.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Target (PostgreSQL):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ecsdb=# select * from student_info;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id | name | address 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----+------+---------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1 | Zhang San | Guangzhou
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2 | Li Si | Shenzhen
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 3 | Wang Wu | Shanghai
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 4 | Zhao Liu | Beijing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 5 | Sun Qi | Wuhan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 6 | A Da | Chengdu
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 7 | A Er | Nanjing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 8 | A San | Beijing
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(8 rows)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Insert data on source:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; ecs.student_info &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;bb&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Commit&lt;/span&gt; complete.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Check target — data synchronized successfully.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2024/08/13/ogg" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2024/08/13/ogg&lt;/a&gt;搭建oracle-pg同步实操步骤/&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>OGG PostgreSQL-to-Oracle Sync — Hands-On Steps</title><link>https://lastdba.com/en/2024/08/13/ogg-postgresql-to-oracle-sync-hands-on-steps/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/ogg-postgresql-to-oracle-sync-hands-on-steps/</guid><description>&lt;p&gt;OGG software version: (19.1.0.0.4)
Oracle version: 11.2.0.4
PG version: pg10
OGG download: &lt;a href="https://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html" target="_blank" rel="noreferrer"&gt;https://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;glibc issue handling: &lt;a href="https://www.cnblogs.com/hxlasky/p/16779047.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/hxlasky/p/16779047.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6d5467d7acac.png" alt="img" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;1. Create Database and Table on Source
 &lt;div id="1-create-database-and-table-on-source" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-create-database-and-table-on-source" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root&lt;span style="color:#f92672"&gt;@&lt;/span&gt;node2 &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#f92672"&gt;#&lt;/span&gt; su &lt;span style="color:#f92672"&gt;-&lt;/span&gt; postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Last&lt;/span&gt; login: Tue Jul &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; CST &lt;span style="color:#ae81ff"&gt;2020&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; pts&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;node2 &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pg_ctl &lt;span style="color:#f92672"&gt;-&lt;/span&gt;D &lt;span style="color:#f92672"&gt;/&lt;/span&gt;opt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pgsql_data &lt;span style="color:#f92672"&gt;-&lt;/span&gt;l logfile &lt;span style="color:#66d9ef"&gt;start&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;start&lt;/span&gt;.... done
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; test
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab1(id int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,name varchar(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;2. Create Database and Table on Target
 &lt;div id="2-create-database-and-table-on-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-create-database-and-table-on-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sqlplus &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; sysdba
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; ORALZL.tab1(id number &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,name varchar2(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;3. Extract and Install OGG for PostgreSQL
 &lt;div id="3-extract-and-install-ogg-for-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#3-extract-and-install-ogg-for-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Unlike OGG for Oracle, OGG for PG only needs extraction. Oracle version requires running runInstaller.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 ~]$ id postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid=54323(postgres) gid=54330(postgres) groups=54330(postgres)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 ~]$ exit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logout
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]# mkdir /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]# chown -R postgres /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]# chmod -R 755 /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]#
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# ls -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total 240744
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r--r--. 1 root root 87028695 Jul 22 02:51 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# chmod 777 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# unzip 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Archive: 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inflating: ggs_Linux_x64_PostgreSQL_64bit.tar 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inflating: OGG-19.1.0.0-README.txt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inflating: release-notes-oracle-goldengate_19.1.0.200714.pdf 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# chmod 777 ggs_Linux_x64_PostgreSQL_64bit.tar
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# su - postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 ~]$ cd /soft
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 soft]$ tar -xf ggs_Linux_x64_PostgreSQL_64bit.tar -C /ogg&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;4. Configure PG User Environment Variables
 &lt;div id="4-configure-pg-user-environment-variables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#4-configure-pg-user-environment-variables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Source PG:&lt;/p&gt;</description><content:encoded>&lt;p&gt;OGG software version: (19.1.0.0.4)
Oracle version: 11.2.0.4
PG version: pg10
OGG download: &lt;a href="https://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html" target="_blank" rel="noreferrer"&gt;https://www.oracle.com/technetwork/middleware/goldengate/downloads/index.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;glibc issue handling: &lt;a href="https://www.cnblogs.com/hxlasky/p/16779047.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/hxlasky/p/16779047.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6d5467d7acac.png" alt="img" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;1. Create Database and Table on Source
 &lt;div id="1-create-database-and-table-on-source" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-create-database-and-table-on-source" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root&lt;span style="color:#f92672"&gt;@&lt;/span&gt;node2 &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#f92672"&gt;#&lt;/span&gt; su &lt;span style="color:#f92672"&gt;-&lt;/span&gt; postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Last&lt;/span&gt; login: Tue Jul &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; CST &lt;span style="color:#ae81ff"&gt;2020&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; pts&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;node2 &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pg_ctl &lt;span style="color:#f92672"&gt;-&lt;/span&gt;D &lt;span style="color:#f92672"&gt;/&lt;/span&gt;opt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pgsql_data &lt;span style="color:#f92672"&gt;-&lt;/span&gt;l logfile &lt;span style="color:#66d9ef"&gt;start&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;start&lt;/span&gt;.... done
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server started
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; test
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab1(id int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,name varchar(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;2. Create Database and Table on Target
 &lt;div id="2-create-database-and-table-on-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-create-database-and-table-on-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sqlplus &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; sysdba
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; ORALZL.tab1(id number &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,name varchar2(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;3. Extract and Install OGG for PostgreSQL
 &lt;div id="3-extract-and-install-ogg-for-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#3-extract-and-install-ogg-for-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Unlike OGG for Oracle, OGG for PG only needs extraction. Oracle version requires running runInstaller.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 ~]$ id postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uid=54323(postgres) gid=54330(postgres) groups=54330(postgres)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 ~]$ exit
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;logout
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]# mkdir /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]# chown -R postgres /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]# chmod -R 755 /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 ~]#
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# ls -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total 240744
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r--r--. 1 root root 87028695 Jul 22 02:51 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# chmod 777 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# unzip 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Archive: 19100200714_ggs_Linux_x64_PostgreSQL_64bit.zip
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inflating: ggs_Linux_x64_PostgreSQL_64bit.tar 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inflating: OGG-19.1.0.0-README.txt 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; inflating: release-notes-oracle-goldengate_19.1.0.200714.pdf 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# chmod 777 ggs_Linux_x64_PostgreSQL_64bit.tar
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@node1 soft]# su - postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 ~]$ cd /soft
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@node1 soft]$ tar -xf ggs_Linux_x64_PostgreSQL_64bit.tar -C /ogg&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;4. Configure PG User Environment Variables
 &lt;div id="4-configure-pg-user-environment-variables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#4-configure-pg-user-environment-variables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Source PG:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat .bash_profile
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## .bash_profile&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## Get the aliases and functions&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt; -f ~/.bashrc &lt;span style="color:#f92672"&gt;]&lt;/span&gt;; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;. ~/.bashrc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;fi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## User specific environment and startup programs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PATH&lt;span style="color:#f92672"&gt;=&lt;/span&gt;$PATH:$HOME/.local/bin:$HOME/bin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export GGHOME&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PG_DATA&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/opt/pgsql/pgsql/bin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PATH&lt;span style="color:#f92672"&gt;=&lt;/span&gt;$PG_DATA:$PATH
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PG_HOME&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/opt/pgsql/pgsql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export LD_LIBRARY_PATH&lt;span style="color:#f92672"&gt;=&lt;/span&gt;$PG_HOME/lib:$LD_LIBRARY_PATH:$GGHOME/lib
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export ODBCINI&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/home/postgres/odbc.ini
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export DD_ODBC_HOME&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export PATH
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ source .bash_profile&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;5. Configure Manager Process
 &lt;div id="5-configure-manager-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#5-configure-manager-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cd /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ogg&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ./ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Oracle GoldenGate Command Interpreter &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; PostgreSQL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Version 19.1.0.0.200714 OGGCORE_19.1.0.0.0OGGBP_PLATFORMS_200628.2141
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Linux, x64, 64bit &lt;span style="color:#f92672"&gt;(&lt;/span&gt;optimized&lt;span style="color:#f92672"&gt;)&lt;/span&gt;, PostgreSQL on Jun &lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2020&lt;/span&gt; 03:59:15
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Operating system character set identified as UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Copyright &lt;span style="color:#f92672"&gt;(&lt;/span&gt;C&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 1995, 2019, Oracle and/or its affiliates. All rights reserved.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 2&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER STOPPED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 3&amp;gt; create subdirs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Creating subdirectories under current directory /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Parameter file /ogg/dirprm: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Report file /ogg/dirrpt: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Checkpoint file /ogg/dirchk: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Process status files /ogg/dirpcs: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SQL script files /ogg/dirsql: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database definitions files /ogg/dirdef: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Extract data files /ogg/dirdat: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Temporary files /ogg/dirtmp: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Credential store files /ogg/dircrd: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Masterkey wallet files /ogg/dirwlt: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Dump files /ogg/dirdmp: created.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 4&amp;gt; edit params mgr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 5&amp;gt; view params mgr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;port &lt;span style="color:#ae81ff"&gt;7809&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 6&amp;gt; start mgr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Manager started.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 7&amp;gt; info all
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Program Status Group Lag at Chkpt Time Since Chkpt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER RUNNING &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;6. Adjust Source PostgreSQL Parameters
 &lt;div id="6-adjust-source-postgresql-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#6-adjust-source-postgresql-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ogg&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ vi /opt/pgsql_data/postgresql.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_level &lt;span style="color:#f92672"&gt;=&lt;/span&gt; logical &lt;span style="color:#75715e"&gt;#minimal, replica, or logical&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_replication_slots &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#75715e"&gt;#max number of replication slots&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_wal_sender &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#75715e"&gt;#maximum number of wal sender processes&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_receiver_status_interval&lt;span style="color:#f92672"&gt;=&lt;/span&gt;10s &lt;span style="color:#75715e"&gt;#optional, keep the system default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_sender_timeout &lt;span style="color:#75715e"&gt;#optional, keep the system default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;track_commit_timestamp &lt;span style="color:#75715e"&gt;#optional, keep the system default&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_receiver_status_interval&lt;span style="color:#f92672"&gt;=&lt;/span&gt;10s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_sender_timeout &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 60s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;track_commit_timestamp&lt;span style="color:#f92672"&gt;=&lt;/span&gt;off&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Restart source PostgreSQL after adjustment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ogg&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pg_ctl -D /opt/pgsql_data -l logfile stop
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ogg&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pg_ctl -D /opt/pgsql_data -l logfile start&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;7. Configure OGG for PG Data Source
 &lt;div id="7-configure-ogg-for-pg-data-source" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#7-configure-ogg-for-pg-data-source" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cd /home/postgres/
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi odbc.ini
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;ODBC Data Sources&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PGDSN&lt;span style="color:#f92672"&gt;=&lt;/span&gt;DataDirect 7.1 PostgreSQL Wire Protocol
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;DataDirect 7.1 PostgreSQL Wire Protocol
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;scott&lt;span style="color:#f92672"&gt;=&lt;/span&gt;DataDirect 7.1 PostgreSQL Wire Protocol
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;ODBC&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;IANAAppCodePage&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;InstallDir&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;PGDSN&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Driver&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/ogg/lib/GGpsql25.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Description&lt;span style="color:#f92672"&gt;=&lt;/span&gt;DataDirect 7.1 PostgreSQL Wire Protocol
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Database&lt;span style="color:#f92672"&gt;=&lt;/span&gt;test
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HostName&lt;span style="color:#f92672"&gt;=&lt;/span&gt;192.168.1.112
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PortNumber&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5432&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LogonID&lt;span style="color:#f92672"&gt;=&lt;/span&gt;postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Password&lt;span style="color:#f92672"&gt;=&lt;/span&gt;postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;8. Connection Test
 &lt;div id="8-connection-test" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#8-connection-test" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cd /ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@node1 ogg&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ./ggsci
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--dblogin sourcedb pgdsn userid pg, password &lt;span style="color:#ae81ff"&gt;123456&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 1&amp;gt; dblogin sourcedb pgdsn userid postgres, password postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-07-22 03:10:44 INFO OGG-03036 Database character set identified as UTF-8. Locale: en_US.UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-07-22 03:10:44 INFO OGG-03037 Session character set identified as UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully logged into database.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1 as postgres@pgdsn&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 2&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;9. Enable Table-Level Supplemental Logging
 &lt;div id="9-enable-table-level-supplemental-logging" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#9-enable-table-level-supplemental-logging" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Source:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 3&amp;gt; dblogin sourcedb pgdsn userid postgres, password postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-07-22 03:21:01 INFO OGG-03036 Database character set identified as UTF-8. Locale: en_US.UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-07-22 03:21:01 INFO OGG-03037 Session character set identified as UTF-8.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully logged into database.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1 as postgres@pgdsn&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 4&amp;gt; add trandata public.tab1 --If table has primary key, this step can be skipped
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Logging of supplemental log data is enabled &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; table public.tab1. REPLICA IDENTITY was DEFAULT and is changed to FULL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1 as postgres@pgdsn&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 5&amp;gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1 as postgres@pgdsn&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 5&amp;gt; info trandata public.tab1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Logging of supplemental log data is enabled &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; table public.t1 with REPLICA IDENTITY set to FULL&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;10. Register Extract Process on PG
 &lt;div id="10-register-extract-process-on-pg" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#10-register-extract-process-on-pg" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Registering an extract process on PG essentially creates a replication slot. The output plugin defaults to test_decoding.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1 as postgres@pgdsn&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 6&amp;gt; Register Extract ext_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2020-07-22 03:25:27 INFO OGG-25355 Successfully created replication slot &lt;span style="color:#e6db74"&gt;&amp;#39;ext_pg_2947c06e0ea2ec74&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; EXTRACT group &lt;span style="color:#e6db74"&gt;&amp;#39;EXT_PG&amp;#39;&lt;/span&gt; in database &lt;span style="color:#e6db74"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;11. Configure Extract and Pump Processes
 &lt;div id="11-configure-extract-and-pump-processes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#11-configure-extract-and-pump-processes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Configure extract process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; edit param ext_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SETENV &lt;span style="color:#f92672"&gt;(&lt;/span&gt; PGCLIENTENCODING &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;UTF8&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SETENV &lt;span style="color:#f92672"&gt;(&lt;/span&gt;NLS_LANG&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;AMERICAN_AMERICA.AL32UTF8&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; extract ext_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SETENV &lt;span style="color:#f92672"&gt;(&lt;/span&gt;ODBCINI&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;/home/pg/odbc.ini&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;)&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SOURCEDB pgdsn, USERID pg, PASSWORD &lt;span style="color:#ae81ff"&gt;123456&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; exttrail ./dirdat/st
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE PUBLIC.TAB1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ----GETTRUNCATES &lt;span style="color:#75715e"&gt;### This feature on PostgreSQL 10.12: ERROR OGG-25541 GETTRUNCATES is not valid. PostgreSQL supports TRUNCATE capture from version 11.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note: PG to Oracle cannot sync TRUNCATE commands.&lt;/p&gt;
&lt;p&gt;Configure pump process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; extract pump_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SETENV &lt;span style="color:#f92672"&gt;(&lt;/span&gt;ODBCINI&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;/home/pg/odbc.ini&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RMTHOST 172.17.100.150, MGRPORT 7809, compress
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; numfiles &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RMTTRAIL ./dirdat/rt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE PUBLIC.TAB1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;12. Add Trail and Start Extract/Pump
 &lt;div id="12-add-trail-and-start-extractpump" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#12-add-trail-and-start-extractpump" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ADD extract ext_pg, TRANLOG,BEGIN now
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; add exttrail ./dirdat/st,extract ext_pg,megabytes &lt;span style="color:#ae81ff"&gt;500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; add extract pump_pg,exttrailsource ./dirdat/st
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; add rmttrail ./dirdat/rt,extract pump_pg,megabytes &lt;span style="color:#ae81ff"&gt;500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; start ext_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; start pump_pg&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;13. Configure defgen
 &lt;div id="13-configure-defgen" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#13-configure-defgen" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;If table structures are consistent, you can configure ASSUMETARGETDEFS.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;edit param defgen
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DEFSFILE ./dirdef/tab1.def, PURGE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SOURCEDB pgdsn, USERID pg, PASSWORD &lt;span style="color:#ae81ff"&gt;123456&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE PUBLIC.tab1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generate table definition file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;defgen paramfile /oggpg/dirdef/tab1.prm&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Copy the defgen file to the target&amp;rsquo;s dirdef directory.&lt;/p&gt;

&lt;h4 class="relative group"&gt;14. Verify Trail Delivery on Target
 &lt;div id="14-verify-trail-delivery-on-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#14-verify-trail-delivery-on-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;oracle@lzl dirdat&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cd dirdat
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;oracle@lzl dirdat&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -rw-r----- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;1439&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; 11:02 rt000000000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;15. Register Extract Process on PG
 &lt;div id="15-register-extract-process-on-pg" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#15-register-extract-process-on-pg" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Registering an extract process on PG creates a replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node1 as postgres@pgdsn&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 6&amp;gt; Register Extract ext_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2020-07-22 03:25:27 INFO OGG-25355 Successfully created replication slot &lt;span style="color:#e6db74"&gt;&amp;#39;ext_pg_2947c06e0ea2ec74&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; EXTRACT group &lt;span style="color:#e6db74"&gt;&amp;#39;EXT_PG&amp;#39;&lt;/span&gt; in database &lt;span style="color:#e6db74"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;16. Configure Oracle User Environment Variables
 &lt;div id="16-configure-oracle-user-environment-variables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#16-configure-oracle-user-environment-variables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;export ORACLE_BASE&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/oracle/app/oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export ORACLE_HOME&lt;span style="color:#f92672"&gt;=&lt;/span&gt;$ORACLE_BASE/product/11.2.0/dbhome_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export ORACLE_SID&lt;span style="color:#f92672"&gt;=&lt;/span&gt;oralzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export OGG_HOME&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/oggfororacle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export PATH&lt;span style="color:#f92672"&gt;=&lt;/span&gt;$ORACLE_HOME/bin:$ORACLE_HOME/OPatch:$PATH
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export TNS_ADMIN&lt;span style="color:#f92672"&gt;=&lt;/span&gt;$ORACLE_HOME/network/admin
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; export LD_LIBRARY_PATH&lt;span style="color:#f92672"&gt;=&lt;/span&gt;$ORACLE_HOME/lib:$OGG_HOME:$ORACLE_HOME/lib32:/lib/usr/lib:/usr/local/lib&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;17. Configure Oracle Listener and TNS
 &lt;div id="17-configure-oracle-listener-and-tns" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#17-configure-oracle-listener-and-tns" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;OGG for Oracle defaults to using TNS_ADMIN&amp;rsquo;s tns.
You can also manually configure during extract configuration, e.g.: &lt;code&gt;USERID goldengate@127.0.0.1:1521/oralzl, PASSWORD 123456&lt;/code&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;18. Install OGG for Oracle on Target
 &lt;div id="18-install-ogg-for-oracle-on-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#18-install-ogg-for-oracle-on-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Download OGG software.
Configure oggcore.rsp file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle.install.responseFileVersion&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/home/oracle/oggcore.rsp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INSTALL_OPTION&lt;span style="color:#f92672"&gt;=&lt;/span&gt;ORA11g
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SOFTWARE_LOCATION&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/ogg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;START_MANAGER&lt;span style="color:#f92672"&gt;=&lt;/span&gt;false
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MANAGER_PORT&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7809&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATABASE_LOCATION&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/oracle/db/11.2.0.4
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INVENTORY_LOCATION&lt;span style="color:#f92672"&gt;=&lt;/span&gt;/oracle/oraInventory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;UNIX_GROUP_NAME&lt;span style="color:#f92672"&gt;=&lt;/span&gt;oinstall&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Silent install OGG:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./runInstaller -silent -nowait -responseFile /home/oracle/oggcore.rsp&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;19. Oracle Database User and Privileges
 &lt;div id="19-oracle-database-user-and-privileges" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#19-oracle-database-user-and-privileges" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;create user goldengate identified by &lt;span style="color:#e6db74"&gt;&amp;#34;123456&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant create session,alter session to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant alter system to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant resource to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant connect to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; any dictionary to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant flashback any table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; any table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; any table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant insert any table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant update any table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant delete any table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; on dba_clusters to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant execute on dbms_flashback to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant create table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant create sequence to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant alter any table to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant dba to goldengate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grant lock any table to goldengate;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;20. Target Manager Process
 &lt;div id="20-target-manager-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#20-target-manager-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;edit param mgr
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PORT &lt;span style="color:#ae81ff"&gt;7809&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DYNAMICPORTLIST 7810-7980
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PURGEOLDEXTRACTS ./dirdat/*, USECHECKPOINTS, MINKEEPDAYS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PURGEDDLHISTORY MINKEEPDAYS 7, MAXKEEPDAYS &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LAGREPORTHOURS &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LAGINFOMINUTES &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LAGCRITICALMINUTES &lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;start mgr&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;21. Configure Replicat Process on Target
 &lt;div id="21-configure-replicat-process-on-target" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#21-configure-replicat-process-on-target" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node2&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 8&amp;gt; dblogin userid goldengate@127.0.0.1:1521/oralzl,password &lt;span style="color:#ae81ff"&gt;123456&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GGSCI &lt;span style="color:#f92672"&gt;(&lt;/span&gt;node2 as postgres@pgdsn&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 9&amp;gt; add checkpointtable goldengate.chkt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Successfully created checkpoint table public.chkt.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Replicat process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; edit param rep_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;REPLICAT rep_pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; USERID goldengate@127.0.0.1:1521/oralzl, PASSWORD &lt;span style="color:#ae81ff"&gt;123456&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SOURCEDEFS ./dirdef/tab1.def
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; MAP public.tab1, TARGET oralzl.tab1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;add replicat rep_pg,exttrail ./dirdat/rt,checkpointtable goldengate.chkt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; start rep_pg&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;22. Test Sync
 &lt;div id="22-test-sync" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#22-test-sync" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;node1 &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; psql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;test&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d tab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;​&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tab1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------+-----------+----------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;t1_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; t2 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;node2 &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;sqlplus &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; sysdba
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; oralzl.tab1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;​&lt;/span&gt; id name 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------- ----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;​&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; lzl1 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;blockquote&gt;&lt;p&gt;Original link: &lt;a href="https://lastdba.com/2024/08/13/ogg" target="_blank" rel="noreferrer"&gt;https://lastdba.com/2024/08/13/ogg&lt;/a&gt;搭建pg-oracle同步实操步骤/&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>PostgreSQL Logical Replication</title><link>https://lastdba.com/en/2024/08/13/postgresql-logical-replication/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/postgresql-logical-replication/</guid><description>&lt;h3 class="relative group"&gt;What is Logical Replication
 &lt;div id="what-is-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL logical replication is based on logical decoding, which parses WAL log streams into a specified format for output. The subscriber node receives the parsed data and applies it.&lt;/p&gt;
&lt;p&gt;Logical replication differs from streaming replication (physical replication) which is based on instance-level primary-standby where the physical structures are identical. Logical replication can selectively replicate at the table level. Logical Replication in official documentation specifically refers to the &amp;ldquo;publish-subscribe&amp;rdquo; model. In fact, many tools can use logical decoding for heterogeneous database data synchronization.&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;What is Logical Replication
 &lt;div id="what-is-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL logical replication is based on logical decoding, which parses WAL log streams into a specified format for output. The subscriber node receives the parsed data and applies it.&lt;/p&gt;
&lt;p&gt;Logical replication differs from streaming replication (physical replication) which is based on instance-level primary-standby where the physical structures are identical. Logical replication can selectively replicate at the table level. Logical Replication in official documentation specifically refers to the &amp;ldquo;publish-subscribe&amp;rdquo; model. In fact, many tools can use logical decoding for heterogeneous database data synchronization.&lt;/p&gt;
&lt;p&gt;pg9.4&amp;rsquo;s pglogical plugin can support logical replication (&lt;a href="https://github.com/2ndQuadrant/pglogical" target="_blank" rel="noreferrer"&gt;https://github.com/2ndQuadrant/pglogical&lt;/a&gt;), and pg10 onwards natively supports logical replication.&lt;/p&gt;
&lt;p&gt;Logical replication can be used for database upgrades, heterogeneous data migration, table-level data synchronization links, subscribing to data streams, etc.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Logical Decoding
 &lt;div id="logical-decoding" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Logical decoding can parse table data changes in WAL logs into row data streams or SQL text. These row data streams or SQL text can be consumed by other types of databases or software. The specific parsing format is determined by the output plugin.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Replication Slots
 &lt;div id="replication-slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replication-slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In logical replication, a replication slot represents a data change stream. Like physical replication slots, logical replication slots also ensure that after an abnormal replication interruption, the related WAL logs are not deleted, so that WAL log parsing can continue after replication reconnects. A database can have multiple replication slots. Each replication slot has only one output plugin, and each replication slot represents one replication link. Replication slots are essentially used to manage replication links. Unlike streaming replication which can function without replication slots, logical replication must have replication slots.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Output Plugin
 &lt;div id="output-plugin" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#output-plugin" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The output plugin converts WAL log information into the format required by the replication slot. PostgreSQL has some built-in output plugins and additional ones can be added through plugins. Each logical replication slot has an output plugin for WAL-related parsing work.&lt;/p&gt;
&lt;p&gt;Output plugins use callback functions to manage parsing. For example, OUTPUT_PLUGIN_BINARY_OUTPUT and OUTPUT_PLUGIN_TEXTUAL_OUTPUT are used to set whether the out_type is binary or text. There are also callback functions to notify the plugin of transaction data changes and sort transactions. Callback functions of course don&amp;rsquo;t need to be used manually; some built-in output plugins are already packaged.&lt;/p&gt;
&lt;p&gt;Each output plugin has some different parsing behaviors and output formats.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Several Common Output Plugins
 &lt;div id="several-common-output-plugins" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#several-common-output-plugins" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;test_decoding: This is a sample output plugin, essentially the raw form of an output plugin. Official documentation says it&amp;rsquo;s a template, but it can still parse. This output plugin comes with PostgreSQL but needs to be compiled in contrib.&lt;/p&gt;
&lt;p&gt;pgoutput: The default output plugin for the publish-subscribe model. In publish-subscribe, the walsender process uses this output plugin to logically decode WAL logs.&lt;/p&gt;
&lt;p&gt;decoder_raw: Parses into SQL text format. This is not included with PostgreSQL; compile it yourself: &lt;a href="https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw" target="_blank" rel="noreferrer"&gt;https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;wal2json: This output plugin converts WAL log information into JSON format.&lt;/p&gt;
&lt;p&gt;Other output plugins can be referenced at: &lt;a href="https://wiki.postgresql.org/wiki/Logical_Decoding_Plugins" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Logical_Decoding_Plugins&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Some domestic vendors have also made their own output plugins.&lt;/p&gt;
&lt;p&gt;Relationship between several output plugins and logical replication plugins:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8681978ee447.png" alt="5bc6c1dacf2c4f4888f2e299d3d75bc6.png" /&gt;


&lt;img src="data:image/gif;base64,R0lGODlhAQABAPABAP///wAAACH5BAEKAAAALAAAAAABAAEAAAICRAEAOw==" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;pgoutput, test_decoding, and wal2json have been introduced above.&lt;/p&gt;
&lt;p&gt;pglogical was the predecessor of pglogical replication in pg9.4.&lt;/p&gt;
&lt;p&gt;BDR was developed by 2ndQuadrant, supporting bidirectional replication and DDL synchronization with more powerful features. BDR 3.0 onwards became closed-source.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Functions and Tools for Manually Receiving Parsed Data
 &lt;div id="functions-and-tools-for-manually-receiving-parsed-data" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#functions-and-tools-for-manually-receiving-parsed-data" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;pg_logical_slot_get_changes(): Displays parsed data and consumes it.&lt;/p&gt;
&lt;p&gt;pg_logical_slot_peek_changes(): Displays parsed data without consuming it.&lt;/p&gt;
&lt;p&gt;pg_recvlogical: A tool included with PostgreSQL that can consume data within a replication slot, equivalent to the downstream of logical replication. The corresponding physical WAL receiving tool is pg_receivewal.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Logical Decoding Test 1&lt;/strong&gt;: Observing data parsing with 2 different output plugins
 &lt;div id="logical-decoding-test-1-observing-data-parsing-with-2-different-output-plugins" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding-test-1-observing-data-parsing-with-2-different-output-plugins" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create two logical replication slots using logical_test and logical_raw respectively
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_create_logical_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (logical_test,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F50)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_raw&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;decoder_raw&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_create_logical_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (logical_raw,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F88)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Only the upstream is created, slot is in f state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wal_status &lt;span style="color:#f92672"&gt;|&lt;/span&gt; safe_wal_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+---------------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logical_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_decoding &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1766878&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17668&lt;/span&gt;B0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reserved &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logical_raw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; decoder_raw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;557&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F50 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1756&lt;/span&gt;F88 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reserved &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create a table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tdecoder222(a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Attempt to get this DDL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_logical_slot_get_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_raw&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;include-xids&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#66d9ef"&gt;option&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;include-xids&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;0&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;unknown&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CONTEXT: slot &lt;span style="color:#e6db74"&gt;&amp;#34;logical_raw&amp;#34;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;output&lt;/span&gt; plugin &lt;span style="color:#e6db74"&gt;&amp;#34;decoder_raw&amp;#34;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the startup callback
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_logical_slot_get_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;include-xids&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-----+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17669&lt;/span&gt;C8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776778&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- We can see that decoder_raw didn&amp;#39;t parse the DDL at all, and logical_test only got the DDL transaction without the DDL statement itself, essentially not parsing the DDL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert a row
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-----+---------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776890&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776890&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222: &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;: a[integer]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776900&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_raw&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-----+----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776890&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;560&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- test_decoding parsed the transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- decoder_raw parsed the transaction into SQL statements&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This test allows us to conclude:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Replication slots in f state still parse, waiting for downstream consumption&lt;/li&gt;
&lt;li&gt;Each output plugin has some different parsing behaviors and output formats&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;Logical Decoding Test 2: Using pg_recvlogical to receive logically decoded data, simulating a logical replication link
 &lt;div id="logical-decoding-test-2-using-pg_recvlogical-to-receive-logically-decoded-data-simulating-a-logical-replication-link" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding-test-2-using-pg_recvlogical-to-receive-logically-decoded-data-simulating-a-logical-replication-link" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Configure passwordless login
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ vi .pgpass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat .pgpass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzl:5410:lzldb:pg:pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ chmod &lt;span style="color:#ae81ff"&gt;0600&lt;/span&gt; .pgpass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- Start pg_recvlogical
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pg_recvlogical -h lzl -p &lt;span style="color:#ae81ff"&gt;5410&lt;/span&gt; -d lzldb -U pg --slot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical_raw --start -f recv.sql &amp;amp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ps -ef|grep recv|grep -v grep
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;7747&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7355&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 21:40 pts/3 00:00:00 pg_recvlogical -h lzl -p &lt;span style="color:#ae81ff"&gt;5410&lt;/span&gt; -d lzldb -U pg --slot&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical_raw --start -f recv.sql&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;qwe&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;asd&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; tail &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;f recv.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;qwe&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- update was not correctly parsed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add a primary key to the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;lzl2&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tdecoder222 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzlupdate&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; tail &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;f recv.&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 (a, b) &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;lzl2&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tdecoder222 &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;, b &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzlupdate&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&amp;ndash; After adding a primary key, update was correctly parsed by decoder_raw
&amp;ndash; Without a primary key, it won&amp;rsquo;t be correctly parsed. This is related to replica identity, which will be introduced later.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Prerequisites for Logical Replication
 &lt;div id="prerequisites-for-logical-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#prerequisites-for-logical-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;1. Parameters
 &lt;div id="1-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;1.1 Basic Required Parameters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;wal_level. Takes effect after restart, default is replica. The wal_level parameter must be logical. logical does not change WAL to logical; it means that on top of supporting physical replication (replica), the necessary information for logical decoding is added. Since pg9.6, there are only minimal, replica, and logical, with information content increasing successively.&lt;/li&gt;
&lt;li&gt;max_replication_slots. Takes effect after restart, default value below pg9.6 is 0, pg10 and above is 10. 10 is generally sufficient. Like physical replication, logical replication generally also uses replication slots. PostgreSQL backups and physical replication can both occupy replication slot counts.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;1.2 Source-side Required Parameters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;max_wal_senders. Takes effect after restart, default 10. Sender process count limit. The publisher&amp;rsquo;s sender transmits the parsed logs. Generally, one logical replication slot corresponds to one sender and one worker. This is similar to physical replication, where one physical replication slot corresponds to one sender and one receiver.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;1.3 Target-side Required Parameters&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;max_worker_processes. Takes effect after restart, default 8. Worker process count limit. Parallel processes (parallel queries, parallel statistics collection, etc., limited by max_parallel_workers), logical replication worker processes (max_logical_replication_workers), and some other programs that need to fork workers are all related to this parameter. It should be set to max_parallel_workers + logical replication apply workers + other background workers.&lt;/li&gt;
&lt;li&gt;max_logical_replication_workers. Takes effect after restart, default 4. Logical replication worker process count, including logical replication apply worker processes and table sync worker processes.&lt;/li&gt;
&lt;li&gt;max_sync_workers_per_subscription. Takes effect after reload, default 2. Sync worker processes when adding new tables to logical replication. Currently, one table has only one parallel.&lt;/li&gt;
&lt;li&gt;The above three parameters are tiered: max_sync_workers_per_subscription &amp;lt; max_logical_replication_workers &amp;lt; max_worker_processes. In short, there must be workers available.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;2. Permissions
 &lt;div id="2-permissions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-permissions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Replication user permissions. Logical replication users need replication privileges.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;ALTER ROLE &amp;lt;usename&amp;gt; WITH REPLICATION;&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;HBA access restrictions, allowing downstream to access the database using the replication user.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;host lzldb user1 172.17.100.150/32 md5&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For the publish-subscribe model, CREATE permission on the database or superuser permission is needed.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When creating a publication, for table only, at least the table owner with CREATE permission is needed. All other publications require superuser.&lt;/p&gt;
&lt;p&gt;When creating a subscription, superuser is required.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;grant create on database lzl1db to owner1;&lt;/code&gt; or&lt;/p&gt;
&lt;p&gt;&lt;code&gt;alter user replicate1 superuser;&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Additionally, read or write permissions on tables during replication are also necessary.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Logical Synchronization Between PostgreSQL Instances — Publish and Subscribe
 &lt;div id="logical-synchronization-between-postgresql-instances--publish-and-subscribe" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-synchronization-between-postgresql-instances--publish-and-subscribe" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL&amp;rsquo;s built-in logical replication is based on the publish-subscribe model. The publish-subscribe model does not parse into SQL for application.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Publication
 &lt;div id="publication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;A publisher can have multiple publications, and each publication can have multiple tables.&lt;/li&gt;
&lt;li&gt;When publishing, you can specify:&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;for table&lt;/code&gt; — publishes certain tables. New tables need to be explicitly added with ALTER PUBLICATION ADD TABLE. At minimum, the table owner is needed to create this publication.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;for all tables&lt;/code&gt; — publishes all tables under the database. New tables are automatically published. Superuser is required to create this publication.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;for all tables in schema&lt;/code&gt; — publishes all tables under the schema. New tables are automatically published. Superuser is required to create this publication. Supported starting from pg15.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Publications by default include INSERT, UPDATE, DELETE, and TRUNCATE. You can also specify to replicate only certain commands. DDL is not synchronized. (Official documentation verbatim. This means truncate is not considered DDL in PostgreSQL — leaving this as a topic for later research. Truncate is DDL in MySQL and Oracle.)&lt;/li&gt;
&lt;li&gt;Only base tables can be published; temporary tables, foreign tables, views, sequences, etc. cannot be published. Partitioned table publishing is related to PostgreSQL version and partition attributes. pg15 defaults to publishing all partitions of a partitioned table.&lt;/li&gt;
&lt;li&gt;publish_via_partition_root. Supported from pg13. This publication parameter indicates whether partitioned tables use partitions for filtering (false, default) or use the parent partition for row filtering. If set to true, heterogeneous partitioned table logical replication is supported, such as partitioned table to regular table replication. truncate replication is not possible when true.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;Subscription
 &lt;div id="subscription" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subscription" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;A subscription has only one publisher but can subscribe to multiple publications on the publisher.&lt;/li&gt;
&lt;li&gt;A subscriber can have multiple subscriptions, each receiving data from one replication slot.&lt;/li&gt;
&lt;li&gt;One subscription corresponds to one replication slot, which is on the publisher side.&lt;/li&gt;
&lt;li&gt;When creating or deleting a subscription, the replication slot is automatically created or deleted on the publisher by default.&lt;/li&gt;
&lt;li&gt;Creating a subscription requires superuser.&lt;/li&gt;
&lt;li&gt;DDL is not synchronized; tables must already be created.&lt;/li&gt;
&lt;li&gt;Existing data is synchronized by default, via COPY snapshot to the subscriber.&lt;/li&gt;
&lt;li&gt;Synchronization can be paused and resumed with ALTER SUBSCRIPTION sub1 {ENABLE|DISABLE}.&lt;/li&gt;
&lt;li&gt;When a publication adds new tables, refresh is needed on the subscriber side: alter subscription sub1 refresh publication.&lt;/li&gt;
&lt;li&gt;Schema names, table names, and column names must be consistent between publication and subscription. Column types can differ (as long as implicit conversion succeeds). Column order can be different.&lt;/li&gt;
&lt;li&gt;Subscriptions also have some attributes, such as binary transfer, streaming, synchronous commit, two-phase commit, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/71469ba164fc.png" alt="d48af56aa7fc4df89b429605b2e049a9.png" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;logical replication launcher is used to start the subscriber-side worker processes and only exists at startup.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * IDENTIFICATION
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * src/backend/replication/logical/launcher.c
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * NOTES
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This module contains the logical replication worker launcher which
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * uses the background worker infrastructure to start the logical
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * replication workers for every enabled subscription.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Publish-Subscribe Related Views
 &lt;div id="publish-subscribe-related-views" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publish-subscribe-related-views" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;pg_publication; &amp;ndash; View publications. Publications themselves are stateless; replication slots are stateful, so there&amp;rsquo;s no pg_stat_publication.&lt;/p&gt;
&lt;p&gt;pg_publication_tables &amp;ndash; View published tables, simple and clear.&lt;/p&gt;
&lt;p&gt;pg_publication_rel &amp;ndash; View published tables, all IDs.&lt;/p&gt;
&lt;p&gt;pg_stat_subscription &amp;ndash; View subscription status, pid is the worker process pid.&lt;/p&gt;
&lt;p&gt;pg_subscription &amp;ndash; View subscriptions.&lt;/p&gt;
&lt;p&gt;pg_subscription_rel &amp;ndash; View subscription tables. There&amp;rsquo;s no pg_subscription_tables. Additionally, this view can show the sync status of individual tables under a subscription, which the replication slot view cannot do.&lt;/p&gt;
&lt;p&gt;\dRp list replication publications&lt;/p&gt;
&lt;p&gt;\dRs list replication subscriptions&lt;/p&gt;

&lt;h2 class="relative group"&gt;Creating a Publication and Subscription
 &lt;div id="creating-a-publication-and-subscription" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-a-publication-and-subscription" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Using a dedicated replication user replicate1, create a publication and subscription in the database lzldb to implement logical replication of table trep1.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;code&gt;Role&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Host IP&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Port&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Database&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Schema&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Table&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Replication User&lt;/code&gt;&lt;/th&gt;
 &lt;th&gt;&lt;code&gt;Version&lt;/code&gt;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;Publisher&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;172.17.100.150&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;5410&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;lzldb&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;public&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;trep1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;replicate1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;pg13&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;code&gt;Subscriber&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;172.17.100.150&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;5412&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;lzlbd&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;public&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;trep1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;replicate1&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;pg13&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;Creating the Publication
 &lt;div id="creating-the-publication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-the-publication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Modify&lt;/span&gt; postgres.conf, wal_level &lt;span style="color:#66d9ef"&gt;parameter&lt;/span&gt; takes effect &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;restart&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_level&lt;span style="color:#f92672"&gt;=&lt;/span&gt;logical 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Modify&lt;/span&gt; pg_hba.conf file, takes effect &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; reload
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;host&lt;/span&gt; lzldb replicate1 &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; md5
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create replication user and grant privileges
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; replicate1 &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; password &lt;span style="color:#e6db74"&gt;&amp;#39;replicate1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; replicate1 &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; lzldb &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; replicate1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create the table to be replicated and grant privileges to the replication user
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb replicate1 &lt;span style="color:#75715e"&gt;-- If the replication user is not the table owner, should grant select on trep1 to replicate1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; trep1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create publication, superuser can also be used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb replicate1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; publication pub_lzl1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View publication. \dRp or pg_publication
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_publication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubowner &lt;span style="color:#f92672"&gt;|&lt;/span&gt; puballtables &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubinsert &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubupdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubdelete &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubtruncate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pubviaroot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------|----------|----------|--------------|-----------|-----------|-----------|-------------|-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16400&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pub_lzl1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16392&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Creating the Subscription
 &lt;div id="creating-the-subscription" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-the-subscription" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create table definition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use superuser to create subscription
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SUBSCRIPTION sub_test
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CONNECTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;host=172.17.100.150 port=5410 dbname=lzldb user=replicate1 password=replicate1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PUBLICATION pub_lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_subscription; &lt;span style="color:#75715e"&gt;-- View subscription. \dRs or pg_subscription
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subdbid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subowner &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subenabled &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subconninfo &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subslotname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subsynccommit &lt;span style="color:#f92672"&gt;|&lt;/span&gt; subpublications 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------|---------|----------|----------|------------|--------------------------------------------------------------------------------|-------------|---------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sub_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;host&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt; port&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5410&lt;/span&gt; dbname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;lzldb &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;replicate1 password&lt;span style="color:#f92672"&gt;=&lt;/span&gt;replicate1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sub_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;pub_lzl1&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; trep1; &lt;span style="color:#75715e"&gt;-- Verify existing data has been synchronized
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Publish-Subscribe Model Test 1: Truncate Synchronization
 &lt;div id="publish-subscribe-model-test-1-truncate-synchronization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publish-subscribe-model-test-1-truncate-synchronization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; trep1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; trep1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; trep1; &lt;span style="color:#75715e"&gt;-- In publish-subscribe mode, truncate is synchronized
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Publish-Subscribe Model Test 2: Adding New Table Synchronization
 &lt;div id="publish-subscribe-model-test-2-adding-new-table-synchronization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publish-subscribe-model-test-2-adding-new-table-synchronization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Under an existing publish-subscribe, add a new table synchronization. lzldb is publisher, lzlbd is subscriber
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk(a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; publication pub_lzl1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; PUBLICATION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After adding a table on the publisher, refresh must be executed on the subscriber. Refresh defaults to synchronizing existing data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; subscription sub_test refresh publication; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; SUBSCRIPTION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_subscription_rel ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; srsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsubstate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsublsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+---------+------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16389&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;F2898
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16400&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; d &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subscription state codes: i = initializing, d = copying data, s = synchronized, r = ready (normal replication)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- At this point, table tab_pk data has not been synchronized because the subscriber&amp;#39;s replication user lacks query permission on the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; replicate1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_subscription_rel ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; srsubid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsubstate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; srsublsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+---------+------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16389&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;F2898
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16394&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16400&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;D830
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subscription is in ready state, new table synchronization complete&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Replica Identity
 &lt;div id="replica-identity" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Replica identity is written into WAL logs to identify a row of data. Whether it&amp;rsquo;s publish-subscribe or third-party logical sync tools, they all need to locate rows in the table to identify which row downstream the update or delete affects.&lt;/p&gt;
&lt;p&gt;PostgreSQL supports 4 replica identity modes.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;default(d): Default identity for non-system tables. Uses primary key if the table has one; if no primary key, it&amp;rsquo;s nothing.&lt;/li&gt;
&lt;li&gt;index(i): Uses a non-null unique index as the identity. Must be non-null and unique to identify a row. If only unique, there can be multiple null values. You can also explicitly specify the primary key in index mode.&lt;/li&gt;
&lt;li&gt;full(f): Uses all columns of the row as the identity. Full mode increases WAL log volume.&lt;/li&gt;
&lt;li&gt;nothing(n): Default mode for system tables. No identity; update and delete cannot affect downstream.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View table&amp;#39;s replica identity:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tabname1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When a table&amp;#39;s replica identity is i, check if the index is the replica identity:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d tabname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; rel.relname,idx.indisreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index idx ,pg_class rel &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; idx.indexrelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel.oid &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_1&amp;#39;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Modify table replica identity:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; tab1 REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; index_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOTHING&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Replica Identity Test 1: Setting a non-null unique index as replica identity for a table without a primary key
 &lt;div id="replica-identity-test-1-setting-a-non-null-unique-index-as-replica-identity-for-a-table-without-a-primary-key" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity-test-1-setting-a-non-null-unique-index-as-replica-identity-for-a-table-without-a-primary-key" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_idx(a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tab_idx&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relreplident 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tab_idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; d
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;unique&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab_idx(b);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_idx &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- The index used as replica identity must be a non-null unique index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; rel.relname,idx.indisreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index idx ,pg_class rel &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; idx.indexrelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel.oid &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indisreplident 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_idx REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_1; &lt;span style="color:#75715e"&gt;-- Modify table&amp;#39;s replica identity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; rel.relname,idx.indisreplident &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index idx ,pg_class rel &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; idx.indexrelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;rel.oid &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indisreplident 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d tab_idx &lt;span style="color:#75715e"&gt;-- pg_index or \d to view index replica identity. \d can only display explicitly modified index replica identity
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tab_idx&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------+-----------+----------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; b &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_1&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UNIQUE&lt;/span&gt;, btree (b) REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Replica Identity Test 2: Full mode — can duplicate rows be synchronized normally?
 &lt;div id="replica-identity-test-2-full-mode--can-duplicate-rows-be-synchronized-normally" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity-test-2-full-mode--can-duplicate-rows-be-synchronized-normally" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Execute the following on the publisher
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_full (a int,b varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;)); &lt;span style="color:#75715e"&gt;-- Add table sync without primary key and non-null index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- Insert 5 identical rows
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; replicate1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; publication tab_full &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_pk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; PUBLICATION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; subscription sub_test refresh publication; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; SUBSCRIPTION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,2)&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: cannot &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;tab_full&amp;#34;&lt;/span&gt; because it does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; have a replica &lt;span style="color:#66d9ef"&gt;identity&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; publishes deletes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: &lt;span style="color:#66d9ef"&gt;To&lt;/span&gt; enable deleting &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,5)&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: cannot &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;tab_full&amp;#34;&lt;/span&gt; because it does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; have a replica &lt;span style="color:#66d9ef"&gt;identity&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; publishes updates
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: &lt;span style="color:#66d9ef"&gt;To&lt;/span&gt; enable updating the &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; REPLICA &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When the table&amp;#39;s replica identity is d(default), without a primary key it&amp;#39;s nothing. nothing cannot replicate delete and update.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab_full replica &lt;span style="color:#66d9ef"&gt;identity&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;full&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,2)&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- After setting replica identity to full, delete succeeds
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full ; &lt;span style="color:#75715e"&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab_full &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; ctid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;(0,5)&amp;#39;&lt;/span&gt;; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlbd&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; ctid,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab_full ; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&amp;ndash; This example proves 3 points:
&amp;ndash; 1. When replica identity is d(default), it defaults to primary key; if no primary key, it&amp;rsquo;s nothing.
&amp;ndash; 2. nothing cannot replicate delete and update.
&amp;ndash; 3. Duplicate data in full mode can also be normally logically replicated. Although the ctid of data rows differs, the replication goal is still achieved.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Third-Party Synchronization Software
 &lt;div id="third-party-synchronization-software" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#third-party-synchronization-software" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Third-party synchronization software already has relatively mature solutions and is widely used, such as OGG, DTS, KTL, etc.&lt;/p&gt;
&lt;p&gt;These sync tools are very flexible. They can achieve true heterogeneous synchronization, from PostgreSQL databases to different databases or Kafka, big data consumption platforms, etc.&lt;/p&gt;
&lt;p&gt;Of course, they can also sync from other architecture data platforms to PostgreSQL databases, such as the now common Oracle to PostgreSQL sync scenario.&lt;/p&gt;
&lt;p&gt;Since we&amp;rsquo;re mainly discussing the PostgreSQL database itself, when PostgreSQL acts as the downstream target, it&amp;rsquo;s just some data write issues with very few problems. There won&amp;rsquo;t be logical decoding, replication slot issues, etc. So this small section won&amp;rsquo;t discuss PostgreSQL as a heterogeneous sync target. We&amp;rsquo;ll only observe and summarize scenarios where PostgreSQL acts as the upstream syncing to heterogeneous databases. These third-party tools generally utilize PostgreSQL&amp;rsquo;s own logical decoding, specify their own output plugin, and automatically create replication slots and replication links. Some tools automatically create subscriptions, while others only have replication slots without subscriptions.&lt;/p&gt;
&lt;p&gt;Having already understood logical decoding, output plugins, replication slots, replica identity, and prerequisites for replication, let&amp;rsquo;s simulate a PostgreSQL to Oracle sync by directly configuring the prerequisites and starting synchronization.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Creating OGG Sync from PostgreSQL to Oracle
 &lt;div id="creating-ogg-sync-from-postgresql-to-oracle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-ogg-sync-from-postgresql-to-oracle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Software Installation:&lt;/p&gt;
&lt;p&gt;ogg for oracle: Oracle GoldenGate 21.3.0.0.0 for Oracle on Linux x86-64&lt;/p&gt;
&lt;p&gt;ogg for pg: Oracle GoldenGate 21.3.0.0.0 for PostgreSQL on Linux x86-64&lt;/p&gt;
&lt;p&gt;oracle: 11.2.0.4&lt;/p&gt;
&lt;p&gt;pg: 13.10&lt;/p&gt;
&lt;p&gt;Installation steps:&lt;/p&gt;
&lt;p&gt;OGG installation and deployment won&amp;rsquo;t be introduced here. I followed the article&amp;rsquo;s installation steps step by step. Installation article reference: &lt;a href="https://liuzhilong.blog.csdn.net/article/details/129252320?spm=1001.2014.3001.5502" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/129252320?spm=1001.2014.3001.5502&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Sync architecture diagram:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/af57ae30ee4a.png" alt="c8be5aae99704448a8a7e2e01fbde05b.png" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; slot_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;ext_pg_5d4b1d39f7494f79&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;-------+------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ext_pg_5d4b1d39f7494f79
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_decoding &lt;span style="color:#75715e"&gt;-- OGG defaults to using test-decoding
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#75715e"&gt;-- As long as OGG extract is running, the replication slot is active
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3509&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;591&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F3E38
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wal_status &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reserved
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;safe_wal_size &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3509&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt;GoldenGateCapture
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;43665&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;350469&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; streaming
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sent_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4140
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;F4020
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;replay_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flush_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;replay_lag &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_priority &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_state &lt;span style="color:#f92672"&gt;|&lt;/span&gt; async
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;reply_time &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;986625&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- replay_lsn has no value
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Even lag has no value&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Logical Replication Monitoring
 &lt;div id="logical-replication-monitoring" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication-monitoring" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;An important method for logical replication lag monitoring is checking lag from the replication software. Without that, you can only check from the replication slot view. The replication slot view provides quite a lot of information, such as whether the replication slot is active directly indicating whether the replication link is syncing.&lt;/p&gt;
&lt;p&gt;The replication slot view is very important for logical replication monitoring. Some additional monitoring for publish-subscribe was introduced earlier. Here we focus on broader logical replication monitoring.&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_replication_slots
 &lt;div id="pg_replication_slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_replication_slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The replication slot view shows information about each replication slot and some slot statuses. Manually created slots or slots automatically created by tools and subscriptions are all displayed here.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;slot_name&lt;/th&gt;
 &lt;th&gt;Replication slot name&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;plugin&lt;/td&gt;
 &lt;td&gt;Output plugin name for logical replication slots. If empty, it&amp;rsquo;s a physical replication slot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;slot_type&lt;/td&gt;
 &lt;td&gt;physical or logical&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;datoid&lt;/td&gt;
 &lt;td&gt;Database ID for logical replication slot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;database&lt;/td&gt;
 &lt;td&gt;Database for logical replication slot&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;temporary&lt;/td&gt;
 &lt;td&gt;Whether it&amp;rsquo;s a temporary replication slot. Temporary slots are not written to disk and are automatically deleted when the session ends. pg_basebackup uses temporary slots by default&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;active&lt;/td&gt;
 &lt;td&gt;Replication slot status: t or f. If f, you should quickly consider restarting the replication link or deleting it, as it may block WAL log deletion and fill up the primary database disk. This is related to the max_slot_wal_keep_size parameter&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;active_pid&lt;/td&gt;
 &lt;td&gt;walsender PID using this replication slot. Only present when the slot status is t&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;xmin&lt;/td&gt;
 &lt;td&gt;Minimum transaction ID the slot needs to hold&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;catalog_xmin&lt;/td&gt;
 &lt;td&gt;Minimum catalog transaction ID the slot needs to hold&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;restart_lsn&lt;/td&gt;
 &lt;td&gt;LSN position of WAL the slot needs to retain to ensure downstream consumer&amp;rsquo;s required WAL won&amp;rsquo;t be cleaned. max_slot_wal_keep_size parameter is the maximum WAL size the slot needs to retain. Beyond this value, WAL can also be deleted. Default -1 means never cleaned. This value represents the LSN position after the downstream&amp;rsquo;s latest checkpoint consumption and can help locate replication link lag&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;confirmed_flush_lsn&lt;/td&gt;
 &lt;td&gt;LSN confirmed received by the logical replication downstream. Empty for physical replication slots&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;wal_status&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;Status of WAL claimed by this replication slot&lt;/code&gt; reserved: the slot reserves WAL, WAL hasn&amp;rsquo;t exceeded max_wal_size (auto-checkpoint interval) extended: the slot reserves WAL, WAL has exceeded max_wal_size but the slot still retains it. WAL in this state is still within wal_keep_size or max_slot_wal_keep_size unreserved: the slot no longer retains needed WAL, WAL will be deleted at next checkpoint lost: WAL needed by the slot has been cleaned, slot is invalid. &lt;code&gt;The last two states are seen only when max_slot_wal_keep_size is non-negative. This is easy to understand, since max_slot_wal_keep_size is the criterion for whether WAL can be deleted. Without a mechanism to delete slot WAL, unreserved and lost states wouldn't appear.&lt;/code&gt; &lt;code&gt;If restart_lsn is NULL, this field is null. Also easy to understand — if there's no WAL LSN, you can't know the WAL retention position or judge whether WAL has exceeded wal_keep_size or max_slot_wal_keep_size.&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;safe_wal_size&lt;/td&gt;
 &lt;td&gt;Number of WAL bytes that can be written before WAL files would be deleted. If this value is negative or zero, it means max_slot_wal_keep_size has been exceeded, and WAL files will be deleted as soon as a checkpoint occurs, requiring the standby using this slot to be rebuilt&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;pg_stat_replication
 &lt;div id="pg_stat_replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_stat_replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Rather than replication status, it&amp;rsquo;s more accurate to call it walsender status. This view shows the status of each walsender, one record per walsender.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If present in pg_replication_slots but not in pg_stat_replication, the walsender is gone; logical replication is down; pg_replication_slots active should be f.&lt;/li&gt;
&lt;li&gt;If absent in pg_replication_slots but present in pg_stat_replication, this is physical replication without a replication slot.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can have replication stat info without a replication slot. Replication slots with walsenders also need this view because it reveals more replication status info than pg_replication_slots.&lt;/p&gt;
&lt;p&gt;So when the replication slot hasn&amp;rsquo;t failed, pg_stat_replication is very important for monitoring logical replication lag.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;pid&lt;/th&gt;
 &lt;th&gt;walsender PID, same as pg_replication_slots active_pid&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;usesysid&lt;/td&gt;
 &lt;td&gt;User OID connected to this walsender, i.e., the downstream&amp;rsquo;s replication user OID&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;usename&lt;/td&gt;
 &lt;td&gt;Username connected to this walsender&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;application_name&lt;/td&gt;
 &lt;td&gt;Downstream application name. If subscription, it&amp;rsquo;s the subscription name. If pg_recvlogical, it&amp;rsquo;s pg_recvlogical&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;client_addr&lt;/td&gt;
 &lt;td&gt;Downstream IP. If empty, it&amp;rsquo;s a local socket connection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;client_hostname&lt;/td&gt;
 &lt;td&gt;Downstream hostname&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;client_port&lt;/td&gt;
 &lt;td&gt;Downstream port. If -1, it&amp;rsquo;s a local socket connection&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;backend_start&lt;/td&gt;
 &lt;td&gt;Backend start time, i.e., when downstream connected to walsender&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;backend_xmin&lt;/td&gt;
 &lt;td&gt;Standby&amp;rsquo;s xmin when hot_standby_feedback is enabled. This is clearly for physical replication&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;state&lt;/td&gt;
 &lt;td&gt;States are relatively easy to understand. startup: walsender starting. catchup: walsender catching up with primary logs. streaming: walsender has caught up with primary logs, normal replication state. backup: walsender sending backup, this state appears for walsender used for backup. stopping: walsender stopping&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sent_lsn&lt;/td&gt;
 &lt;td&gt;LSN sent&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;write_lsn&lt;/td&gt;
 &lt;td&gt;LSN written to disk by downstream&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;flush_lsn&lt;/td&gt;
 &lt;td&gt;LSN flushed to disk by downstream&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;replay_lsn&lt;/td&gt;
 &lt;td&gt;LSN replayed by downstream&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;write_lag&lt;/td&gt;
 &lt;td&gt;Log lag between primary flush wal and downstream write&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;flush_lag&lt;/td&gt;
 &lt;td&gt;Log lag between primary flush wal and downstream flush&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;replay_lag&lt;/td&gt;
 &lt;td&gt;Log lag between primary flush wal and downstream relay&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sync_priority&lt;/td&gt;
 &lt;td&gt;Synchronization priority&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sync_state&lt;/td&gt;
 &lt;td&gt;Synchronization state&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;reply_time&lt;/td&gt;
 &lt;td&gt;Last reply time&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h4 class="relative group"&gt;Relationship between sent_lsn, write_lsn, flush_lsn, replay_lsn
 &lt;div id="relationship-between-sent_lsn-write_lsn-flush_lsn-replay_lsn" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#relationship-between-sent_lsn-write_lsn-flush_lsn-replay_lsn" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/08ab66a5cd02.png" alt="f2a89e2dabf84e0794c1a5854bb2006f.png" /&gt;&lt;/p&gt;
&lt;p&gt;The above nicely shows the hierarchical relationship of sent_lsn, write_lsn, flush_lsn.&lt;/p&gt;
&lt;p&gt;These monitoring metrics look very much like streaming replication. For logical replication, sent_lsn, write_lsn, flush_lsn also generally have values.&lt;/p&gt;
&lt;p&gt;However, when logical replication doesn&amp;rsquo;t know what the downstream is, the replay log replay action may not exist, so logical replication may not have replay_lsn.&lt;/p&gt;
&lt;p&gt;But one thing is confirmed effective: sent_lsn.&lt;/p&gt;
&lt;p&gt;After reviewing pg_replication_slots and pg_stat_replication view monitoring, we find that neither shows log parsing delay; at most, you can see log transmission delay.&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_stat_replication_slots
 &lt;div id="pg_stat_replication_slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_stat_replication_slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;This view has been available since pg14. It specifically monitors logical replication slot status and can additionally monitor spill status. For pg13, you can only check the pg_replslot directory. Spill will be introduced later.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Logical Replication Slot Transaction Snapshots and pg_logical Directory
 &lt;div id="logical-replication-slot-transaction-snapshots-and-pg_logical-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-replication-slot-transaction-snapshots-and-pg_logical-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The transaction snapshots needed by replication slots are persisted to disk. The source code is in snapbuild.c.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildSerializationPoint&lt;/span&gt;(SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder, XLogRecPtr lsn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (builder&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; SNAPBUILD_CONSISTENT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildRestore&lt;/span&gt;(builder, lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildSerialize&lt;/span&gt;(builder, lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Snap persistence has two behaviors: one is restore, loading from disk to memory; the other is serialize, persisting from memory to disk.&lt;/p&gt;
&lt;p&gt;Transaction snapshot persistence:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildSerialize&lt;/span&gt;(SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder, XLogRecPtr lsn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_logical/snapshots/%X-%X.snap&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(uint32) (lsn &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ret &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * somebody else has already serialized to this point, don&amp;#39;t overwrite
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * but remember location, so we don&amp;#39;t need to read old data again.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * To be sure it has been synced to disk after the rename() from the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * tempfile filename to the real filename, we just repeat the fsync.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * That ought to be cheap because in most scenarios it should already
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * be safely on disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(path, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_logical/snapshots&amp;#34;&lt;/span&gt;, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;builder&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;last_serialized_snapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lsn;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;goto&lt;/span&gt; out;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Transaction snapshot loading into memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SnapBuildRestore&lt;/span&gt;(SnapBuild &lt;span style="color:#f92672"&gt;*&lt;/span&gt;builder, XLogRecPtr lsn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (builder&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;state &lt;span style="color:#f92672"&gt;==&lt;/span&gt; SNAPBUILD_CONSISTENT)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;sprintf&lt;/span&gt;(path, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_logical/snapshots/%X-%X.snap&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(uint32) (lsn &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) lsn);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;OpenTransientFile&lt;/span&gt;(path, O_RDONLY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PG_BINARY);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The transactions needed by logical replication slots, before being committed, store dirty transaction data and unconsumed data under pg_logical/snapshots/. After committing data or starting the replication slot, data is handed to reorderbuffer; or after cleaning the replication slot, the data is released.&lt;/p&gt;
&lt;p&gt;My environment has a long-unused slot with restart_lsn at 0/1776858:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; slot_name,plugin,slot_type,&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;,active,restart_lsn &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; slot_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+---------------+-----------+----------+--------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logical_test &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_decoding &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776858&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The oldest snapshot under pg_logical/snapshots/ is it:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl snapshots&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;144&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; 20:41 0-1776858.snap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;144&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; 20:44 0-1776900.snap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;144&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; 20:45 0-1776938.snap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Delete unwanted replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_drop_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After a few minutes, snap is deleted:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#960050;background-color:#1e0010"&gt;@&lt;/span&gt;lzl snapshots]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776858.&lt;/span&gt;snap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls: cannot access &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1776858.&lt;/span&gt;snap: No such file or directory&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Logical Decoding Working Memory and Spill to pg_replslot
 &lt;div id="logical-decoding-working-memory-and-spill-to-pg_replslot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical-decoding-working-memory-and-spill-to-pg_replslot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;logical_decoding_work_mem
 &lt;div id="logical_decoding_work_mem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#logical_decoding_work_mem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Before pg13, logical decoding would retain at most 4096 changes in memory (max_changes_in_memory hardcoded). Beyond 4096 changes, transaction data would be written to disk.&lt;/p&gt;
&lt;p&gt;pg13 introduced the logical_decoding_work_mem parameter. Working memory used by logical decoding. All walsender decoding uses this shared memory area. If the data held by logical decoding exceeds this memory value, it&amp;rsquo;s written to disk. Logical decoding working memory size defaults to 64MB.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Related ReorderBuffer and Spill
 &lt;div id="related-reorderbuffer-and-spill" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#related-reorderbuffer-and-spill" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Description in reorderbuffer.c:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; This module gets handed individual pieces of transactions in the order
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; toplevel transaction sized pieces. When a transaction is completely
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; reassembled &lt;span style="color:#f92672"&gt;-&lt;/span&gt; signaled by reading the transaction commit record &lt;span style="color:#f92672"&gt;-&lt;/span&gt; it
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; will then call the output &lt;span style="color:#a6e22e"&gt;plugin&lt;/span&gt; (cf. &lt;span style="color:#a6e22e"&gt;ReorderBufferCommit&lt;/span&gt;()) with the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; individual changes. The output plugins rely on snapshots built by
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; snapbuild.c which hands them to us.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When a transaction commits, reorderbuffer can receive transaction entries and sort them, then send data changes to the output plugin for output. The output plugin relies on snapshots built by snapbuild.c, which are handed to reorderbuffer.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Maximum number of changes kept in memory, per transaction. After that,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * changes are spooled to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The current value should be sufficient to decode the entire transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * without hitting disk in OLTP workloads, while starting to spool to disk in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * other workloads reasonably fast.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * At some point in the future it probably makes sense to have a more elaborate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * resource management here, but it&amp;#39;s not entirely clear what that would look
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * like.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; logical_decoding_work_mem;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; Size max_changes_in_memory &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt;; &lt;span style="color:#75715e"&gt;/* XXX for restore only */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When parsed data exceeds logical_decoding_work_mem, it&amp;rsquo;s written to disk. max_changes_in_memory is hardcoded at 4096, now only used to trigger disk restore. In pg12 source, there&amp;rsquo;s no int logical_decoding_work_mem, and subsequent serialization was also judged based on max_changes_in_memory.&lt;/p&gt;
&lt;p&gt;In pg13, Disk serialization source code starts from line 2333.
When parsed data in memory exceeds logical_decoding_work_mem, the largest transaction is spilled to disk.
ReorderBufferLargestTXN(rb) finds the largest transaction. ReorderBufferSerializeTXN(rb, txn) persists this transaction.
The immediately following code is ReorderBufferSerializeTXN():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Spill data of a large transaction (and its subtransactions) to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeTXN&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dlist_iter subtxn_i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dlist_mutable_iter change_i;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogSegNo curOpenSegNo &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size spilled &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(DEBUG2, &lt;span style="color:#e6db74"&gt;&amp;#34;spill %u changes in XID %u to disk&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (uint32) txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nentries_mem, txn&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* do the same to all child TXs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At debug2 level, spill logs are output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Given a replication slot, transaction ID and segment number, fill in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * corresponding spill file into &amp;#39;path&amp;#39;, which is a caller-owned buffer of size
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * at least MAXPGPATH.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializedPath&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;path, ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, TransactionId xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogSegNo segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogRecPtr recptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;XLogSegNoOffsetToRecPtr&lt;/span&gt;(segno, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, wal_segment_size, recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(path, MAXPGPATH, &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/%s/xid-%u-lsn-%X-%X.spill&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#a6e22e"&gt;NameStr&lt;/span&gt;(MyReplicationSlot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;data.name),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (uint32) (recptr &lt;span style="color:#f92672"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;), (uint32) recptr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Persisted to pg_replslot/replication_slot_name/xid-%u-lsn-%X-%X.spill.&lt;/p&gt;
&lt;p&gt;Similarly, besides serialize, there&amp;rsquo;s also restore:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Restore a number of changes spilled to disk back into memory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; Size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferRestoreChanges&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TXNEntryFile &lt;span style="color:#f92672"&gt;*&lt;/span&gt;file, XLogSegNo &lt;span style="color:#f92672"&gt;*&lt;/span&gt;segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size restored &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogSegNo last_segno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (restored &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; max_changes_in_memory &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;segno &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; last_segno)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; readBytes;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ReorderBufferDiskChange &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ondisk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Read the statically sized part of a change which has information
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * about the total size. If we couldn&amp;#39;t read a record, we&amp;#39;re at the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * end of this file.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferSerializeReserve&lt;/span&gt;(rb, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(ReorderBufferDiskChange));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;readBytes &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;FileRead&lt;/span&gt;(file&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;vfd, rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;outbuf,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(ReorderBufferDiskChange),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curOffset, WAIT_EVENT_REORDER_BUFFER_READ);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * ok, read a full change from disk, now restore it into proper
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * in-memory format
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferRestoreChange&lt;/span&gt;(rb, txn, rb&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;outbuf);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;restored&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; restored;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ReorderBufferRestoreChanges() just does judgment and looping (restored++), calling ReorderBufferRestoreChange():&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferRestoreChange&lt;/span&gt;(ReorderBuffer &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rb, ReorderBufferTXN &lt;span style="color:#f92672"&gt;*&lt;/span&gt;txn,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;data)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Update memory accounting for the restored change. We need to do this
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * although we don&amp;#39;t check the memory limit when restoring the changes in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * this branch (we only do that when initially queueing the changes after
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * decoding), because we will release the changes later, and that will
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * update the accounting too (subtracting the size from the counters). And
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * we don&amp;#39;t want to underflow there.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferChangeMemoryUpdate&lt;/span&gt;(rb, change, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ReorderBufferChangeSize&lt;/span&gt;(change));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Looking at ReorderBufferRestoreChanges(), its while loop judgment is restored &amp;lt; max_changes_in_memory, and restored starts at 0. It will loop 4096 times. There&amp;rsquo;s a comment in ReorderBufferRestoreChange explaining that although restore isn&amp;rsquo;t based on memory limit, it still needs to update memory usage to prevent underflow. Meaning: since I just restored it, don&amp;rsquo;t spill it again in a nested fashion.
(It feels a bit odd — clearly judging by memory limit would be better rather than hardcoding the restore loop count.)&lt;/p&gt;
&lt;p&gt;Interpreting the logical decoding process based on source code:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b6939610878b.png" alt="69b422c44d6d43e991eea0c8904e166c.png" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;xtransaction snap preserves the metadata needed for parsing locks. When the replication slot is inactive or the transaction is uncommitted, snap persists to pg_logical/snapshots/%restart_lsn.snap. After the replication slot restarts or the transaction commits, the transaction snap metadata on disk is read into memory and sent to reorderbuffer for WAL parsing, sorted by transaction start order. If logical decoding data fills up the logical_decoding_work_mem memory area, change entries persist the largest transaction to pg_replslot/slot_name/xid-%u-lsn-%X-%X.spill, send other in-memory transactions to the output plugin for format conversion, and finally send the decoded information to the downstream.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In fact, we can see that long transactions and large transactions can make the entire logical replication link very slow. Large transactions are preferentially spilled to disk, then loaded back from disk to memory after the transaction completes.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Logical replication is managed through replication slots: one replication slot, one walsender process, one output plugin.&lt;/li&gt;
&lt;li&gt;The output plugin determines the output form of logically decoded data, specified when creating the replication slot.&lt;/li&gt;
&lt;li&gt;Replica identity priority recommendation: primary key -&amp;gt; non-null unique index -&amp;gt; full.&lt;/li&gt;
&lt;li&gt;The publish-subscribe model is PostgreSQL&amp;rsquo;s built-in logical replication, using pgoutput by default. Publications can be used independently.&lt;/li&gt;
&lt;li&gt;The publisher process is walsender, and the subscriber process is worker. Pay attention to their respective process parameters.&lt;/li&gt;
&lt;li&gt;There are many third-party logical replication tools; they generally use PostgreSQL&amp;rsquo;s logical decoding system.&lt;/li&gt;
&lt;li&gt;For monitoring replication links, pay attention to pg_replication_slots and pg_stat_replication.&lt;/li&gt;
&lt;li&gt;The pg_logical directory stores transaction parsing metadata snaps, waiting for transaction commit before parsing.&lt;/li&gt;
&lt;li&gt;The pg_replslot directory stores transaction information exceeding logical_decoding_work_mem, called spill.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Book: 《PostgreSQL实战》&lt;/p&gt;
&lt;p&gt;Official Documentation:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logicaldecoding.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: Chapter 49. Logical Decoding&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logicaldecoding-example.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: 49.1. Logical Decoding Examples&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/app-pgrecvlogical.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: pg_recvlogical&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/14/view-pg-replication-slots.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 14: 52.81. pg_replication_slots&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/runtime-config-replication.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 13: 19.6. Replication&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/logicaldecoding-output-plugin.html#LOGICALDECODING-OUTPUT-PLUGIN-CALLBACKS" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 13: 48.6. Logical Decoding Output Plugins&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logical-replication-publication.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: 31.1. Publication&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/logical-replication-subscription.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: 31.2. Subscription&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/sql-createpublication.html" target="_blank" rel="noreferrer"&gt;PostgreSQL: Documentation: 15: CREATE PUBLICATION&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Highly Recommended:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgconf.asia/JA/2017/wp-content/uploads/sites/2/2017/12/D2-A7-EN.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.anayrat.info/en/2018/03/10/logical-replication-internals/" target="_blank" rel="noreferrer"&gt;Logical replication internals | Select * from Adrien&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2019/08/22/an-overview-of-logical-replication-in-postgresql/" target="_blank" rel="noreferrer"&gt;An Overview of Logical Replication in PostgreSQL - Highgo Software Inc.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/4lF4LonDQeICPtbUX_HVnw" target="_blank" rel="noreferrer"&gt;Discussing Logical Decoding from Real Cases&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/yiukiiOa0snzcak1ThmP7Q" target="_blank" rel="noreferrer"&gt;Long-Troubling Logical Decoding Anomalies&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/monitoring-replication-pg_stat_replication/" target="_blank" rel="noreferrer"&gt;Monitoring replication: pg_stat_replication - CYBERTEC&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other References:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://zhuanlan.zhihu.com/p/311496301" target="_blank" rel="noreferrer"&gt;https://zhuanlan.zhihu.com/p/311496301&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://dzone.com/articles/postgresql-change-data-capture" target="_blank" rel="noreferrer"&gt;A Guide to PostgreSQL Change Data Capture - DZone&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/change-data-capture-in-postgres-how-to-use-logical-decoding-and/ba-p/1396421" target="_blank" rel="noreferrer"&gt;Change data capture in Postgres: How to use logical decoding and wal2json - Microsoft Community Hub&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.kancloud.cn/taobaomysql/monthly/213790" target="_blank" rel="noreferrer"&gt;PgSQL · The Secrets of PostgreSQL Logical Streaming Replication Technology · Database Kernel Monthly · KanCloud&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/dafei1288/article/details/124629875" target="_blank" rel="noreferrer"&gt;Analyzing PostgreSQL Logical Replication Principles - CSDN Blog&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://pigsty.cc/zh/blog/2021/03/03/postgres" target="_blank" rel="noreferrer"&gt;http://pigsty.cc/zh/blog/2021/03/03/postgres&lt;/a&gt;逻辑复制详解/&lt;/p&gt;
&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/concepts-logical" target="_blank" rel="noreferrer"&gt;Logical replication and logical decoding - Azure Database for PostgreSQL - Flexible Server | Microsoft Learn&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Streaming Replication</title><link>https://lastdba.com/en/2024/08/13/postgresql-streaming-replication/</link><pubDate>Tue, 13 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/13/postgresql-streaming-replication/</guid><description>&lt;h4 class="relative group"&gt;What is PostgreSQL Streaming Replication?
 &lt;div id="what-is-postgresql-streaming-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-postgresql-streaming-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Streaming Replication is a method for transmitting WAL logs introduced in PostgreSQL 9.0. As soon as the primary database generates a log, it is immediately passed to the standby database.
Before PostgreSQL 9.0, PostgreSQL could only transfer WAL logs one at a time (log shipping), and the standby database lagged behind the primary by at least one WAL log.



&lt;img src="https://lastdba.com/img/csdn/973437b5ba70.png" alt="PG Streaming Replication Principle" /&gt;&lt;/p&gt;</description><content:encoded>
&lt;h4 class="relative group"&gt;What is PostgreSQL Streaming Replication?
 &lt;div id="what-is-postgresql-streaming-replication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-postgresql-streaming-replication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Streaming Replication is a method for transmitting WAL logs introduced in PostgreSQL 9.0. As soon as the primary database generates a log, it is immediately passed to the standby database.
Before PostgreSQL 9.0, PostgreSQL could only transfer WAL logs one at a time (log shipping), and the standby database lagged behind the primary by at least one WAL log.



&lt;img src="https://lastdba.com/img/csdn/973437b5ba70.png" alt="PG Streaming Replication Principle" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;PostgreSQL Streaming Replication Processes
 &lt;div id="postgresql-streaming-replication-processes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-streaming-replication-processes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;wal sender&lt;/strong&gt;: The wal sender exists on the primary database. The wal sender process transmits the WAL between the primary&amp;rsquo;s latest LSN and the standby&amp;rsquo;s latest LSN to the standby.
&lt;strong&gt;wal receiver&lt;/strong&gt;: The wal receiver exists on the standby database. The wal receiver process transmits the standby&amp;rsquo;s latest LSN to the primary. The wal receiver receives WAL data passed by the wal sender and writes it to WAL logs.
&lt;strong&gt;startup&lt;/strong&gt;: The standby instance recovery process. It replays WAL logs on the standby database.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;16776&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14632&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 13:33 ? 00:00:00 postgres: wal sender process lzl 172.17.100.150&lt;span style="color:#f92672"&gt;(&lt;/span&gt;13338&lt;span style="color:#f92672"&gt;)&lt;/span&gt; streaming 0/3002D30
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;16775&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15329&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 13:33 ? 00:00:00 postgres: wal receiver process streaming 0/3002D30
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;15330&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15329&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 10:26 ? 00:00:00 postgres: startup process recovering &lt;span style="color:#ae81ff"&gt;000000010000000000000003&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;PostgreSQL Streaming Replication Principles
 &lt;div id="postgresql-streaming-replication-principles" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-streaming-replication-principles" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL streaming replication is primarily divided into two phases: the instance recovery phase and the primary-standby synchronization phase.
&lt;strong&gt;Instance Recovery Phase&lt;/strong&gt;: When a PostgreSQL database crashes abnormally, upon startup, PostgreSQL replays all WAL logs after the last checkpoint before the crash (this is the same principle as instance recovery in Oracle, MySQL, and other relational databases — the goal is to bring the database to a consistent state). When setting up a PostgreSQL standby database, the primary is generally not shut down. At this point, the backup taken from the primary is in an inconsistent state, and the startup process performs instance recovery when the standby starts.
&lt;strong&gt;Primary-Standby Synchronization Phase&lt;/strong&gt;: The wal receiver process transmits the standby&amp;rsquo;s latest LSN to the primary. The wal sender transmits the WAL between the primary&amp;rsquo;s latest LSN and the standby&amp;rsquo;s latest LSN to the wal receiver. The wal receiver receives the WAL and writes it to disk, and the startup process replays the WAL logs on the standby.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Synchronous and Asynchronous
 &lt;div id="synchronous-and-asynchronous" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#synchronous-and-asynchronous" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL primary-standby has 5 modes, controlled by the &lt;code&gt;synchronous_commit&lt;/code&gt; parameter. The essence of the &lt;code&gt;synchronous_commit&lt;/code&gt; parameter is to control when the primary commits.
&lt;strong&gt;remote_apply&lt;/strong&gt;: The primary commits only after all standby databases have applied the WAL. This mode is synchronous — the primary and standby are consistent. Data that can be queried on the primary can definitely also be queried on the standby. In this mode there is no primary-standby lag, but it affects the primary commit time because the primary commit needs to wait for network transmission and standby application time.&lt;/p&gt;
&lt;p&gt;The meaning of synchronous_commit has two scenarios: with and without standby databases (when synchronous_standby_names is empty or non-empty):&lt;/p&gt;
&lt;p&gt;When synchronous_standby_names is non-empty:
&lt;strong&gt;remote_apply&lt;/strong&gt;: The standby has applied the WAL, only then can the primary commit. In this mode the primary and standby are synchronous.
&lt;strong&gt;on&lt;/strong&gt;: default. The primary commits when both primary and standby WAL have been written to disk. Similar to semi-synchronous, no data will be lost.
&lt;strong&gt;remote_write&lt;/strong&gt;: The primary commits when the standby has received the WAL and written the WAL log to the filesystem cache. At this point the standby has received the WAL but hasn&amp;rsquo;t flushed it to disk yet. If the OS crashes, data will be lost.
&lt;strong&gt;local&lt;/strong&gt;: The primary commits when its WAL is flushed to disk. This mode is asynchronous — the primary doesn&amp;rsquo;t need to confirm the standby&amp;rsquo;s status before committing.
&lt;strong&gt;off&lt;/strong&gt;: The primary can commit without its own WAL being flushed to disk. There is a risk of data loss. Not recommended.&lt;/p&gt;
&lt;p&gt;When synchronous_standby_names is empty:
(When synchronous_standby_names is empty, only on and off are effective for synchronous_commit. If set to remote_apply, remote_write, or local, they are still treated as on.)
&lt;strong&gt;on&lt;/strong&gt;: default. The database WAL must be written to disk before a transaction can commit.
&lt;strong&gt;off&lt;/strong&gt;: The primary can commit without its own WAL being flushed to disk. There is a risk of data loss. Not recommended.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary-Standby Synchronization Relationship&lt;/strong&gt;



&lt;img src="https://lastdba.com/img/csdn/43bd95ea31d5.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary-Standby Reliability&lt;/strong&gt;



&lt;img src="https://lastdba.com/img/csdn/647bc630a1ef.png" alt="Image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Failover
 &lt;div id="failover" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#failover" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When the primary crashes, the standby needs to initiate failover, at which point the standby becomes the new primary. PostgreSQL does not provide a method to detect failures, but it does provide a method to activate the primary. (Typically, third-party tools call the PostgreSQL activation method, while primary-standby monitoring, primary crash detection, connection switching, etc. are not handled by PostgreSQL itself.)
PostgreSQL provides 2 methods to activate a standby as the primary: the trigger_file file and the pg_ctl promote command. (In PostgreSQL 12 and later, trigger_file becomes promote_trigger_file.)
Both trigger_file and pg_ctl promote can complete the task of activating the standby with a single command. The difference is that trigger_file requires the trigger_file configuration to be written in recovery.conf in advance.
Using trigger_file for primary-standby switchover (pg_ctl promote has the same effect and is simpler):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Configure trigger_file in the standby&amp;rsquo;s recovery.conf&lt;/li&gt;
&lt;li&gt;Shut down the primary&lt;/li&gt;
&lt;li&gt;touch trigger_file to start the old standby as the new primary&lt;/li&gt;
&lt;li&gt;Configure recovery.conf to start the old primary as the new standby&lt;/li&gt;
&lt;li&gt;Observe the new and old primary/standby databases&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Failover Example:&lt;/strong&gt;
Environment:
Primary	172.17.100.150	5432
Standby	172.17.100.150	5433&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Configure trigger_file in standby recovery.conf&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat recovery.conf|grep trigger
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;trigger_file &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;/pg/pg96data_sla/trigger.kenyon&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ll /pg/pg96data_sla/trigger.kenyon
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls: cannot access /pg/pg96data_sla/trigger.kenyon: No such file or directory&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Simply configure the trigger file path in recovery.conf. The trigger file won&amp;rsquo;t appear until it&amp;rsquo;s created.&lt;/p&gt;
&lt;p&gt;Add configuration to standby postgres.conf&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;max_wal_senders &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#75715e"&gt;#max_wal_senders is the maximum number of sender processes, default is 0, so the standby must configure this before switchover&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hot_standby&lt;span style="color:#f92672"&gt;=&lt;/span&gt;on &lt;span style="color:#75715e"&gt;#Enable query functionality on standby&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. Shut down the primary&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pg_ctl stop -D /pg/pg96data_pri -m fast
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; server to shut down.... &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;server stopped&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(Check if primary WAL has been fully applied by the standby: pg9.6- cd pg_xlog; pg 10+ cd pg_wal)&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ls -ltr|tail -n &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{print $NF}&amp;#39;&lt;/span&gt;|&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; read xlog;&lt;span style="color:#66d9ef"&gt;do&lt;/span&gt; pg_xlogdump $xlog;&lt;span style="color:#66d9ef"&gt;done&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Look for the keyword &amp;ldquo;shutdown&amp;rdquo; in the standby&amp;rsquo;s WAL&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. touch to activate standby (or pg_ctl promote -D /pg/pg96data_sla)&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ touch /pg/pg96data_sla/trigger.kenyon&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point recovery.conf becomes recovery.done&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Set up primary as standby&lt;/strong&gt;
Configure the new standby&amp;rsquo;s recovery.conf file. You can directly copy from the old standby and modify the IP and directory.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi $新备库/recover.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;standby_mode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;primary_conninfo &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;host=172.17.100.150 port=5433 user=lzl password=lzl&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recovery_target_timeline &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;latest&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Configure postgres.conf, write hot_standby = on to enable queries on the standby&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vi $新备库/postgres.conf
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;hot_standby &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Start the new standby&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/pg/pg96/bin/pg_ctl -D /pg/pg96data_pri -l /pg/pg96data_pri/server.log start&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;5. Check primary and standby&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# \x&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Expanded display is on.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# select * from pg_stat_replication ;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-&lt;span style="color:#f92672"&gt;[&lt;/span&gt; RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;]&lt;/span&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid | &lt;span style="color:#ae81ff"&gt;24766&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid | &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename | lzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name | walreceiver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr | 172.17.100.150
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port | &lt;span style="color:#ae81ff"&gt;47345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start | 2021-07-30 07:44:05.582546+00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin | 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state | streaming
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sent_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;flush_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;replay_location | 0/4033790
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_priority | &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sync_state | async&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;pg_basebackup
 &lt;div id="pg_basebackup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_basebackup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;pg_basebackup is PostgreSQL&amp;rsquo;s built-in backup tool for performing base backups. pg_basebackup can be used for PITR and also for constructing log-shipping standby and streaming standby. It is PostgreSQL&amp;rsquo;s physical backup tool.
&lt;a href="https://liuzhilong.blog.csdn.net/article/details/119533506" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/119533506&lt;/a&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_rewind
 &lt;div id="pg_rewind" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_rewind" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;pg_rewind can be used as a maintenance tool for PostgreSQL primary-standby setups. When the timelines of two PostgreSQL instances diverge, pg_rewind can synchronize between the instances. (For example, if the standby is running after failover while the primary was still running, the timelines of primary and standby will have diverged.)
&lt;a href="https://liuzhilong.blog.csdn.net/article/details/119250794" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net/article/details/119250794&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Replication Slots
 &lt;div id="replication-slots" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replication-slots" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What are PostgreSQL Replication Slots?&lt;/strong&gt;
In a primary-standby architecture, if the standby hasn&amp;rsquo;t received WAL logs yet but the primary has already deleted them, such lag cannot be automatically recovered. Replication slots ensure that the primary won&amp;rsquo;t delete WAL logs that haven&amp;rsquo;t been transmitted to the standby yet.
Without replication slots, you might need to use wal_keep_size/wal_keep_segments and archive_command to ensure WAL logs aren&amp;rsquo;t deleted, but this approach always retains too many WAL files and cannot guarantee that WAL won&amp;rsquo;t be deleted when lag is significant. This is exactly why replication slots were created.
However, replication slots may cause the primary to never delete WAL (e.g., if the standby has crashed), causing disk space to fill up. In this case, max_slot_wal_keep_size is needed to set an upper limit on WAL file retention.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Replication Slot Parameters:&lt;/strong&gt;
&lt;strong&gt;max_slot_wal_keep_size&lt;/strong&gt;: When replication slots are in use, this parameter defines the maximum size of WAL files in the pg_wal directory. The default value is -1, meaning there is no upper limit on the size of WAL files retained by the primary for the standby.
&lt;strong&gt;wal_keep_segments&lt;/strong&gt;/&lt;strong&gt;wal_keep_size&lt;/strong&gt;: PostgreSQL 12 and below use wal_keep_segments, PostgreSQL 13 and above use wal_keep_size. Ensures that WAL files under pg_wal are not deleted. Without replication slots, WAL files exceeding this size may be deleted, potentially causing the standby to be unable to catch up. If set too large, it may cause the directory to grow excessively. The default is 0, meaning WAL files are not retained. If WAL is deleted, the following error may occur:
&lt;code&gt;ERROR: requested WAL segment xxxx has already been removed&lt;/code&gt;
At this point the standby can only hope for archives; otherwise, it must be rebuilt.
&lt;strong&gt;primary_slot_name&lt;/strong&gt;: Sets the slot name, indicating that the PostgreSQL primary-standby setup uses replication slots. So enabling PostgreSQL replication slots requires at least the following configuration:
primary_conninfo = &amp;lsquo;host=172.17.100.150 port=5433 user=lzl password=lzl&amp;rsquo;
primary_slot_name = &amp;lsquo;pg_slot_lzl&amp;rsquo;
&lt;strong&gt;max_replication_slots&lt;/strong&gt;: The maximum number of replication slots. Takes effect upon restart. If there aren&amp;rsquo;t enough replication slots, the standby will fail to start. This value should be set relatively high. In PostgreSQL versions below 9.6, the default is 0; in PostgreSQL 10 and above, it&amp;rsquo;s 10.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Creating PostgreSQL Replication Slots&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Set max_replication_slots on the primary&lt;/strong&gt;
Primary: (my PostgreSQL version is 9.6)
max_replication_slots=10
Add to postgres.conf and restart the primary&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Create replication slot&lt;/strong&gt;
Create replication slot:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_create_physical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;pg_slot_lzl&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xlog_position 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_slot_lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View replication slot&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; slot_name, slot_type, active &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----------+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_slot_lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; physical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Set primary_slot_name on the standby&lt;/strong&gt;
&lt;code&gt;primary_slot_name = 'pg_slot_lzl'&lt;/code&gt;
Add to recovery.conf and restart the standby&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Check replication slot&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;,pg_xlogfile_name(restart_lsn)&lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; current_xxlog &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; current_xxlog 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------+--------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_slot_lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; physical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12802&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A002340 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00000002000000000000000&lt;/span&gt;A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--pg_xlogfile_name(restart_lsn) to view current WAL log info&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Query Conflicts
 &lt;div id="query-conflicts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#query-conflicts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;What are Query Conflicts?&lt;/strong&gt;
The standby may encounter the following error during queries:
&lt;code&gt;ERROR：canceling statement due to conflict with recovery&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Why do conflicts occur? Let&amp;rsquo;s think carefully. For example, if the standby is executing a query based on a certain table (this query could be from an application or a manual connection), and the primary executes a drop table operation, this operation is written to WAL logs and transmitted to the standby for application. To ensure data consistency, PostgreSQL will inevitably replay the data quickly, at which point the drop table and select will conflict, as shown below:



&lt;img src="https://lastdba.com/img/csdn/2d333af63baa.png" alt="Query conflict during DDL" /&gt;&lt;/p&gt;
&lt;p&gt;Conflict scenarios:
The above only introduces one type of query conflict. To summarize, there are several situations:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Primary exclusive locks (including explicit LOCK commands and various DDL operations)&lt;/li&gt;
&lt;li&gt;Primary vacuum cleaning up dead tuples — if the standby is using those tuples, a conflict will occur&lt;/li&gt;
&lt;li&gt;Primary drops the tablespace that the standby query is using&lt;/li&gt;
&lt;li&gt;Primary drops the database that the standby is using



&lt;img src="https://lastdba.com/img/csdn/2194edd3e8af.png" alt="Query conflict during vacuum" /&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Consider a primary-only scenario:
Scenario 1: A session issues a drop table and finds that a select statement is currently executing. The session can only wait for the select to complete its transaction.
Scenario 2: A session issues a vacuum or automatic background vacuum — it won&amp;rsquo;t conflict with current database queries because vacuum won&amp;rsquo;t clean up tuples that are in use.&lt;/p&gt;
&lt;p&gt;The standby&amp;rsquo;s handling is different. Because the primary doesn&amp;rsquo;t know the standby&amp;rsquo;s transaction status, and the standby needs to stay consistent with the primary, this is why &amp;ldquo;query conflicts&amp;rdquo; occur.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Query Conflict Parameters&lt;/strong&gt;
&lt;strong&gt;hot_standby_feedback:&lt;/strong&gt;
This is the most frequently mentioned parameter in the topic of query conflicts. Let&amp;rsquo;s explore it in detail below. Suppose, without a standby, Session 1 queries a row of data, Session 2 deletes that data and commits. Then Session 2 performs a vacuum. We know this vacuum won&amp;rsquo;t delete that row because Session 1&amp;rsquo;s transaction still needs to use that tuple, so it won&amp;rsquo;t be cleaned up. What about in a primary-standby setup? How does the primary know that the standby is still querying when it&amp;rsquo;s about to perform a vacuum? This is the purpose of this parameter. After setting hot_standby_feedback, the standby will periodically notify the primary of the minimum active transaction ID (xmin) value, so the primary vacuum process won&amp;rsquo;t clean up tuples with values greater than xmin.
This parameter helps reduce conflicts but cannot completely avoid them. If you think about it carefully, this parameter only reduces conflicts caused by the primary vacuuming dead tuples — it cannot resolve conflicts caused by exclusive locks. Or conflicts caused by network interruptions: if the network between primary and standby is interrupted, the standby cannot send the xmin value to the primary normally. If the interruption is long enough, the primary will still clean up useless tuples during this period, and after the network recovers, the vacuum conflict described above may occur.
It&amp;rsquo;s worth noting that the hot_standby_feedback parameter won&amp;rsquo;t override the value limited by the old_snapshot_threshold parameter on the primary. The old_snapshot_threshold parameter limits the infinite expansion of dead tuples. When transaction information exceeds the old_snapshot_threshold limit, cleanup will still occur.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;max_standby_streaming_delay:&lt;/strong&gt;
The waiting time before the standby cancels a query due to a conflict caused by receiving WAL stream logs. Setting this parameter means that when a conflict occurs, the standby query won&amp;rsquo;t be immediately canceled but will wait for a period before throwing an error if it hasn&amp;rsquo;t finished. The value can be set based on the expected runtime of potential long transactions on the standby.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;max_standby_archive_delay:&lt;/strong&gt;
The waiting time before the standby cancels a query due to a conflict caused by processing archived WAL logs. Similar to the parameter above.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;vacuum_defer_cleanup_age:&lt;/strong&gt;
Specifies the number of transactions by which vacuum delays cleaning up dead tuples. Vacuum will delay clearing invalid records. The number of deferred transactions is set through vacuum_defer_cleanup_age. That is, vacuum and vacuum full operations won&amp;rsquo;t immediately clean up recently deleted tuples.&lt;/p&gt;
&lt;p&gt;You can view conflict occurrences through the pg_stat_database and pg_stat_database_conflicts views.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Other Related Parameters
 &lt;div id="other-related-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-related-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Transmission Parameters&lt;/strong&gt;
&lt;strong&gt;max_wal_senders&lt;/strong&gt;: The maximum number of services that can fetch WAL using wal sender, i.e., the maximum number of standby databases + basebackup clients. PostgreSQL 9.6 defaults to 0; PostgreSQL 10 and later default to 10.
&lt;strong&gt;wal_send_timeout&lt;/strong&gt;: Interrupt replication after WAL transmission fails for xx seconds. When the standby crashes or the network is interrupted for a long time, WAL will no longer attempt transmission. Default is 60. 0 means never interrupt replication.
&lt;strong&gt;track_commit_timestamp&lt;/strong&gt;: Record transaction timestamps. Default is off.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Primary Parameters&lt;/strong&gt;
&lt;strong&gt;synchronous_standby_names&lt;/strong&gt;:
Configured on the primary. The standby replication list. There are several forms (s1, s2, s3 represent the standby&amp;rsquo;s application_name, configured in recovery.conf):
synchronous_standby_names=&amp;lsquo;s1&amp;rsquo; means the primary can commit when s1 standby returns.
synchronous_standby_names=&amp;lsquo;FIRST 2 (s1,s2,s3)&amp;rsquo; means the primary can commit when the first two of the three standbys (s1 and s2) return.
synchronous_standby_names=&amp;lsquo;ANY 2 (s1,s2,s3)&amp;rsquo; means the primary can commit when any two of the three standbys return.
synchronous_standby_names=&amp;rsquo;&lt;em&gt;&amp;rsquo; means matching any host — the primary can commit when any host returns.
&lt;strong&gt;wal_level&lt;/strong&gt;:
WAL log level. This parameter determines how much information is written to WAL logs. The default is replica, which supports replication and WAL archiving while also supporting standby read-only queries.
minimal: Other than records needed for instance crash recovery, nothing else is recorded. For example, CREATE TABLE AS, CREATE INDEX, CLUSTER, COPY can be skipped. The log information recorded in this mode is insufficient to support WAL archiving and streaming replication.
logical: Adds additional information on top of replica to support logical decoding. This mode increases WAL log volume, especially for databases with many UPDATE and DELETE operations.
Before PostgreSQL 9.6, there were also archive and hot_standby modes, which map to the current replica mode.
&lt;strong&gt;synchronous_commit&lt;/strong&gt;:
As discussed earlier, 5 modes, each with pros and cons.
&lt;strong&gt;archive_mode&lt;/strong&gt;: archive_mode = on enables archiving.
&lt;strong&gt;archive_command&lt;/strong&gt;: Archiving command. PostgreSQL archiving directly calls operating system commands. Can be a simple cp command to the backup side.
&lt;strong&gt;listen_addresses&lt;/strong&gt;: Listening addresses. &amp;lsquo;&lt;/em&gt;&amp;rsquo; means listen on all IPs. Default is local.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Standby Parameters&lt;/strong&gt;
&lt;strong&gt;hot_standby&lt;/strong&gt;: on enables standby read-only queries.
&lt;strong&gt;primary_conninfo&lt;/strong&gt;: The connection string for the standby to connect to the primary. E.g., primary_conninfo = &amp;lsquo;host=172.17.100.150 port=5432 user=lzl password=lzl&amp;rsquo;.
&lt;strong&gt;trigger_file/promote_trigger_file&lt;/strong&gt;: The trigger file for activating the standby. Before PostgreSQL 12 it&amp;rsquo;s called trigger_file; PostgreSQL 12 and later use promote_trigger_file.
Both trigger_file and pg_ctl promote can activate the standby with a single command, as demonstrated earlier.
&lt;strong&gt;wal_receiver_create_temp_slot&lt;/strong&gt;: When there is no slot, temporarily create one (named after primary_slot_name). Default is off.&lt;/p&gt;

&lt;h4 class="relative group"&gt;References:
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;《The Way of PostgreSQL》(修炼之道)
&lt;a href="https://www.postgresql.org/docs/current/warm-standby.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/warm-standby.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/high-availability.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/high-availability.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/runtime-config-replication.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/runtime-config-replication.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/runtime-config-wal.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/runtime-config-wal.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/app-pgbasebackup.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/app-pgbasebackup.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.tencent.com/developer/article/1555354" target="_blank" rel="noreferrer"&gt;https://cloud.tencent.com/developer/article/1555354&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/29737" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/29737&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Streaming_Replication" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Streaming_Replication&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/2018/09/07/setting-up-streaming-replication-postgresql/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/the-synchronous_commit-parameter/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/the-synchronous_commit-parameter/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/m15217321304/article/details/88850146" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/m15217321304/article/details/88850146&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.51cto.com/lishiyan/2460518?source=dra" target="_blank" rel="noreferrer"&gt;https://blog.51cto.com/lishiyan/2460518?source=dra&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Brief Analysis of Linux Memory</title><link>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-linux-memory/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-linux-memory/</guid><description>&lt;h2 class="relative group"&gt;Basic Memory Concepts
 &lt;div id="basic-memory-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#basic-memory-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Operating system memory is very important and fairly complex. Many knowledge points need to be mastered to further analyze program issues. Since this is the first comprehensive and systematic exposure to OS memory, the goal is to understand Linux memory concepts thoroughly and at a low level without diving deep into principles, so this chapter will also try to avoid Linux source code knowledge.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Basic Memory Concepts
 &lt;div id="basic-memory-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#basic-memory-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Operating system memory is very important and fairly complex. Many knowledge points need to be mastered to further analyze program issues. Since this is the first comprehensive and systematic exposure to OS memory, the goal is to understand Linux memory concepts thoroughly and at a low level without diving deep into principles, so this chapter will also try to avoid Linux source code knowledge.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Physical Memory and Virtual Memory
 &lt;div id="physical-memory-and-virtual-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-memory-and-virtual-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e9d3726e966d.png" alt="Insert image description" /&gt;
(&lt;a href="https://en.wikipedia.org/wiki/Memory_address" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Memory_address&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Physical Memory&lt;/strong&gt;: Physical memory is the actual hardware memory present in a computer system, typically in the form of RAM (Random Access Memory).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Virtual Memory&lt;/strong&gt;: Virtual memory is a linear region that has not been allocated actual physical memory. Programs think they have a larger address space than the actual physical memory. The implementation of virtual memory allows programs to access a larger address range than physical memory without requiring all data to be present in physical memory simultaneously. The kernel releases physical pages by releasing linear regions, finding the corresponding physical pages, and releasing them all.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Memory Management Unit (MMU)&lt;/strong&gt;: A hardware component responsible for converting virtual addresses used by programs into physical addresses where data is actually stored in physical memory. The MMU&amp;rsquo;s primary task is to perform address mapping.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Page Table&lt;/strong&gt;: A page table is a data structure used to store the mapping between virtual address space and physical address space. When a program attempts to access virtual memory, the MMU determines the corresponding physical address by querying the page table.&lt;/p&gt;
&lt;p&gt;System call flow:



&lt;img src="https://lastdba.com/img/csdn/b1b0da7b7d74.png" alt="Insert image description" /&gt;
&lt;a href="https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;(The image is a bit blurry, the topmost text is &amp;ldquo;User Space|Kernel Space&amp;rdquo;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;User programs can only access the kernel system through C libraries or system calls; user programs cannot directly access the kernel system&lt;/li&gt;
&lt;li&gt;The kernel system accesses physical memory through the MMU; it accesses disks and other external devices through drivers&lt;/li&gt;
&lt;li&gt;The virtual memory system (VM Subsystem in the figure above) includes buddy, slab algorithms, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;User Space and Kernel Space
 &lt;div id="user-space-and-kernel-space" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#user-space-and-kernel-space" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process virtual address space is divided into user space and kernel space.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;User Space&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The space where user processes run in memory&lt;/li&gt;
&lt;li&gt;This portion of space is protected, and the system prevents other processes from accessing it (except for shared memory)&lt;/li&gt;
&lt;li&gt;However, kernel processes can directly access user processes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Kernel Space&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kernel space is the space used by kernel processes&lt;/li&gt;
&lt;li&gt;In kernel space, the operating system&amp;rsquo;s kernel code runs with higher privilege levels, allowing direct access to system hardware, process management, file system operations, etc.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Context Switching:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When a user program needs to access system services or perform operations requiring higher privileges, a context switch from user space to kernel space is triggered.&lt;/li&gt;
&lt;li&gt;Context switching is an operating system mechanism for saving and restoring program state, ensuring no data loss occurs when switching between user programs and the kernel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The division between user space and kernel space is to provide security isolation, preventing user programs from directly affecting critical parts of the operating system. Early operating systems and DOS did not distinguish between kernel and user space, so a single program&amp;rsquo;s error or malicious behavior could affect the entire system.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4b446b757f77.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.zhihu.com/tardis/zm/art/66794639?source_id=1003" target="_blank" rel="noreferrer"&gt;https://www.zhihu.com/tardis/zm/art/66794639?source_id=1003&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;32-bit systems: Total 4GB address space, 3G UserSpace | 1G KernelSpace&lt;/p&gt;
&lt;p&gt;64-bit systems: Total 256TB address space, 128T UserSpace | 128T KernelSpace&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2^32=4GB, 2^64=16777216TB, why does a 64-bit system only have 256TB address space?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://en.wikipedia.org/wiki/64-bit_computing" target="_blank" rel="noreferrer"&gt;64-bit computing wiki&lt;/a&gt; has an explanation. In short, 256TB (256 × 1024^4 bytes) of memory addresses is sufficient, and currently and in the imaginable future there won&amp;rsquo;t be 16EB (16 × 1024^6 bytes) of memory.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Process Virtual Address Space
 &lt;div id="process-virtual-address-space" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-virtual-address-space" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each process typically has its own independent virtual memory space. Virtual memory is an abstract concept that provides each running process with an address space that appears continuous and private, making each process feel like it has the entire computer system&amp;rsquo;s full memory.&lt;/p&gt;
&lt;p&gt;Process virtual address space layout:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/94df008e9d4a.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2bc35848088f.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://www.sohu.com/a/392831824_467784" target="_blank" rel="noreferrer"&gt;https://www.sohu.com/a/392831824_467784&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The mmap mapping region expands from top to bottom, and the mmap mapping region and heap expand relative to each other until the remaining area in the virtual address space is exhausted. This structure facilitates the C runtime library&amp;rsquo;s use of the mmap mapping region and heap for memory allocation.&lt;/li&gt;
&lt;li&gt;Stack: Stores local variables and function parameters during program execution, growing from high addresses to low addresses&lt;/li&gt;
&lt;li&gt;Heap: Dynamic memory allocation area, managed through functions like malloc, new, free, and delete&lt;/li&gt;
&lt;li&gt;BSS (Uninitialized Variables): Stores uninitialized global variables and static variables&lt;/li&gt;
&lt;li&gt;Data: Stores global variables and static variables with predefined values in source code&lt;/li&gt;
&lt;li&gt;Text: Stores read-only program execution code, i.e., machine instructions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Process virtual address space distribution and mapping:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/92827b8dcc73.png" alt="Insert image description" /&gt;
(&lt;a href="https://velog.io/@mysprtlty/%EA%B0%80%EC%83%81-%EB%A9%94%EB%AA%A8%EB%A6%AC%EC%99%80-%EA%B0%80%EC%83%81-%EC%A3%BC%EC%86%8C-%EA%B3%B5%EA%B0%84" target="_blank" rel="noreferrer"&gt;https://velog.io/@mysprtlty/%EA%B0%80%EC%83%81-%EB%A9%94%EB%AA%A8%EB%A6%AC%EC%99%80-%EA%B0%80%EC%83%81-%EC%A3%BC%EC%86%8C-%EA%B3%B5%EA%B0%84&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shared Memory
 &lt;div id="shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;As mentioned earlier, the user space in the virtual address space cannot be accessed by other user processes. If multi-process user access to the same memory data is implemented through the kernel area, context switching cannot be avoided. Multi-process applications clearly need inter-process access, so a method that directly allows user processes to access the same physical memory emerged — this is shared memory.&lt;/p&gt;
&lt;p&gt;Shared memory is one of the mechanisms for implementing IPC (Inter Process Communication), with other methods including message queues and semaphores.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d969a23e8ba9.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.geeksforgeeks.org/inter-process-communication-ipc/" target="_blank" rel="noreferrer"&gt;https://www.geeksforgeeks.org/inter-process-communication-ipc/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Since it is inherently multiple virtual memory address spaces corresponding to one physical memory address space, you just need to point a segment in the address spaces of two processes to the same physical memory.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1bc1a1357c78.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.softprayog.in/programming/interprocess-communication-using-system-v-shared-memory-in-linux" target="_blank" rel="noreferrer"&gt;https://www.softprayog.in/programming/interprocess-communication-using-system-v-shared-memory-in-linux&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Shared memory (seems like) has many implementation methods. For example, PostgreSQL defaults to using mmap to implement shared memory, refer to the &lt;a href="https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-SHARED-MEMORY-TYPE" target="_blank" rel="noreferrer"&gt;shared_memory_type parameter&lt;/a&gt; and &lt;a href="https://www.postgresql.org/docs/current/kernel-resources.html" target="_blank" rel="noreferrer"&gt;Managing Kernel Resources&lt;/a&gt;. Other shared memory implementations can be found in this article: &lt;a href="https://cloud.tencent.com/developer/article/1551288" target="_blank" rel="noreferrer"&gt;Song Baohua: The Best Shared Memory in the World (The Most Thorough Linux Shared Memory Article)&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Page Table
 &lt;div id="page-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#page-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The process virtual address space is per-process, while there is only one physical memory space. So how do you map and convert virtual memory and shared memory?&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/02d5376a22ed.png" alt="Insert image description" /&gt;
(&lt;a href="https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf" target="_blank" rel="noreferrer"&gt;https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The page table is where the correspondence between virtual memory addresses and physical memory addresses is stored.&lt;/strong&gt; (There are concepts like MMU and TLB here — let&amp;rsquo;s simplify and just think of it as the virtual-to-physical memory conversion function (PAGING), and only look at the page table here). A page table consists of a set of Page Table Entries (PTEs), with each PTE storing the map between a virtual page and a physical page.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3e31e4f9f0eb.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Although a single page table can implement memory-to-virtual-memory conversion, implementing it directly this way would consume too much memory for the page table itself.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4e25eb557be1.png" alt="Insert image description" /&gt;
(&lt;a href="https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf" target="_blank" rel="noreferrer"&gt;https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Therefore, the single page table needs to be subdivided: two-level page tables and four-level page tables.&lt;/p&gt;
&lt;p&gt;Two-level page tables:&lt;/p&gt;
&lt;p&gt;A two-level page table is a further subdivision of a single page table. 4G of space requires 4M of page tables to store the mapping table. If these 4M are divided into 1K pages (4K each), these 1K pages also need a table for management, which we call the &lt;strong&gt;page directory table&lt;/strong&gt;. This page directory table has 1K entries, each 4 bytes, making the page directory table size 4K as well.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/527ca245cbef.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Four-level page tables:&lt;/p&gt;
&lt;p&gt;For 64-bit systems, two-level page tables are insufficient; four-level page tables are needed.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/aec3c7ac7449.png" alt="Insert image description" /&gt;
(&lt;a href="https://maodanp.github.io/2019/06/02/linux-virtual-space/" target="_blank" rel="noreferrer"&gt;https://maodanp.github.io/2019/06/02/linux-virtual-space/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Check page table size:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat /proc/meminfo |grep PageTables
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PageTables: &lt;span style="color:#ae81ff"&gt;46736&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;NUMA
 &lt;div id="numa" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#numa" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Uniform Memory Access (UMA)&lt;/strong&gt;: All CPUs have equivalent access time to memory. The problem with UMA is that multiple processors access memory through a single bus, increasing the load on the shared bus. Multiple processors contend for the memory controller causing conflicts. Additionally, the bus bandwidth is limited, leading to access delays.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Non-Uniform Memory Access (&lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-virtualization_tuning_optimization_guide-numa" target="_blank" rel="noreferrer"&gt;NUMA&lt;/a&gt;)&lt;/strong&gt;: A small group of CPUs access their own local memory together. When there are multiple groups of CPUs and their memory groups, each group of CPUs and memory constitutes a NUMA node.&lt;/p&gt;
&lt;p&gt;UMA:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8aa08bee1125.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;NUMA:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5c9b2c4ad417.png" alt="Insert image description" /&gt;
(&lt;a href="https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Basic NUMA characteristics&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CPU access to local node memory is faster than remote&lt;/li&gt;
&lt;li&gt;By default, Linux prioritizes allocating local memory on the CPU; the policy can be configured&lt;/li&gt;
&lt;li&gt;Each node has its own memory structure&lt;/li&gt;
&lt;li&gt;NUMA is not suitable for all scenarios; it requires adaptation by upper-layer applications&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;NUMA balancing&lt;/em&gt;:
Achieves local access by automatically transferring tasks to remote CPUs or copying remote data to local memory. Enabled by default on Red Hat 7.&lt;/p&gt;
&lt;p&gt;Transferring tasks or copying data itself consumes resources and can slow down tasks. This feature may not be suitable for some applications; for example, Oracle&amp;rsquo;s Exadata has targeted NUMA optimizations.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;numactl&lt;/em&gt;:
NUMA OS configuration tool.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;numactl --show&lt;/code&gt; displays CPU and node information. Below is an example of 4 nodes with 64c 256g total, each node having 16c 64g:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;available: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; nodes &lt;span style="color:#f92672"&gt;(&lt;/span&gt;0-3&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;33&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;34&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65418&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;310&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;45&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;46&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;47&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65536&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;53&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;54&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65536&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; cpus: &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;57&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;61&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; size: &lt;span style="color:#ae81ff"&gt;65536&lt;/span&gt; MB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;node &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; free: &lt;span style="color:#ae81ff"&gt;43&lt;/span&gt; MB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Zone
 &lt;div id="zone" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#zone" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;NUMA divides CPUs and memory into multiple nodes (node 0, node 1, node 2&amp;hellip;). In UMA structures, the CPU memory as a whole can be viewed as node 0.&lt;/p&gt;
&lt;p&gt;In Linux, each node is represented by the data structure &lt;code&gt;struct pglist_data&lt;/code&gt;, with the data type &lt;code&gt;typedef pg_data_t&lt;/code&gt;. Each node is further divided into multiple zones. A zone&amp;rsquo;s data structure is &lt;code&gt;zone_t&lt;/code&gt;, with the data type &lt;code&gt;zone_struct&lt;/code&gt;. There are generally 3 types: &lt;code&gt;ZONE_DMA&lt;/code&gt;, &lt;code&gt;ZONE_NORMAL&lt;/code&gt;, &lt;code&gt;ZONE_HIGHMEM&lt;/code&gt;, each with different functions.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8507ec262240.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.kernel.org/doc/gorman/html/understand/understand005.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/gorman/html/understand/understand005.html&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Zone distribution and functions in 32-bit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ZONE_DMA&lt;/code&gt;: (&amp;lt;16MB), &lt;em&gt;Direct Memory Access&lt;/em&gt; (DMA), the ancient 16 MiB limit, includes &lt;a href="https://en.wikipedia.org/wiki/Industry_Standard_Architecture" target="_blank" rel="noreferrer"&gt;ISA devices&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ZONE_DMA32&lt;/code&gt;: Since many devices encounter problems accessing memory that cannot be addressed with 32 bits, this zone was added in x86-64. This zone only exists in x86-64 architecture. (See &lt;a href="https://lwn.net/Articles/152462/" target="_blank" rel="noreferrer"&gt;ZONE_DMA32&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ZONE_NORMAL&lt;/code&gt;: (16MB to 896MB), ordinary memory domain that can be directly mapped to the kernel segment; most kernel operations take place in the NORMAL zone, this is the most important zone&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ZONE_HIGHMEM&lt;/code&gt;: (&amp;gt;896MB), marks physical memory beyond the kernel segment, cannot be directly called by the kernel.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Zone distribution diagram for 32-bit and 64-bit:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/13284b0811ff.png" alt="Insert image description" /&gt;
&lt;a href="https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://users.cs.utah.edu/~aburtsev/cs5460/lectures/lecture19-memory-management/lecture19-memory-management.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Note that zones are for physical memory. Virtual memory must switch from user mode to kernel mode before it can call physical memory. The following diagram shows the relationship between kernel addresses in virtual memory space and zones in physical address space:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0f0adc123018.png" alt="Insert image description" /&gt;
(&lt;a href="https://wr.informatik.uni-hamburg.de/_media/teaching/wintersemester_2014_2015/kp-1415-memory-management.pdf" target="_blank" rel="noreferrer"&gt;https://wr.informatik.uni-hamburg.de/_media/teaching/wintersemester_2014_2015/kp-1415-memory-management.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Inspect zones:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cat /proc/zoneinfo&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cat /proc/buddyinfo&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;cat /proc/pagetypeinfo&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/buddyinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32 &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2080&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1420&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;995&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;596&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;357&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;278&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;241&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;276&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;133&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;195748&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;204074&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;161167&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;119070&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;70791&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;33578&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9556&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2070&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1034&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2533&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7328&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 1, zone Normal &lt;span style="color:#ae81ff"&gt;11705&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51467&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36752&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21326&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11343&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7309&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5024&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3403&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2597&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3056&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10898&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Pages
 &lt;div id="pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Virtual memory and physical memory are divided into fixed-size segments, typically 4KB in size. So after virtual memory is divided, we have virtual pages, and after physical memory is divided, we have physical pages (PP or PF, Physical Page or Page Frame), also called page frames, also 4KB. The page frame represents the minimum unit of system memory.&lt;/p&gt;
&lt;p&gt;Each page in the virtual address space can be mapped to a page frame in the physical address space through its descriptor.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Huge Pages / Transparent Huge Pages
 &lt;div id="huge-pages--transparent-huge-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#huge-pages--transparent-huge-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Pages are the minimum unit of memory allocation (default 4K). When mapping and allocating a large number of contiguous pages, performance is poor. &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/performance_tuning_guide/s-memory-transhuge" target="_blank" rel="noreferrer"&gt;Huge Pages&lt;/a&gt; solve this problem. Huge pages are not only cheaper to allocate, but the page table is also relatively smaller. hugepagesz is 2 MB or 1 GB, defaulting to 2MB. Huge Pages were implemented starting from Red Hat 6.&lt;/p&gt;
&lt;p&gt;Since manually managing huge pages is cumbersome, Red Hat 6 also provided automatic huge page management, i.e., &lt;strong&gt;Transparent Huge Pages&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;In Oracle database management, huge pages are generally enabled for SGA use, while transparent huge pages are disabled. There is plenty of related material available for searching.&lt;/p&gt;
&lt;p&gt;Similarly, PostgreSQL can also enable huge pages. Since databases generally occupy more operating system memory, enabling huge pages for databases can generally reduce memory allocation pressure.&lt;/p&gt;

&lt;h3 class="relative group"&gt;File Pages &amp;amp; Page Cache / Anonymous Pages &amp;amp; Swap Cache
 &lt;div id="file-pages--page-cache--anonymous-pages--swap-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#file-pages--page-cache--anonymous-pages--swap-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;File pages can be mapped to files on disk. File system reads and writes use Page Cache as buffered IO. Dirty data is synced (or fsynced, etc.) to the corresponding disk periodically or when called. Page Cache is the memory area used to &amp;ldquo;boost&amp;rdquo; disk performance.&lt;/p&gt;
&lt;p&gt;Correspondingly, pages without associated files are called Anonymous Pages, generally corresponding to heap and stack. When memory resources are tight, the kernel writes infrequently used anonymous page data to swap partitions or swap files.&lt;/p&gt;
&lt;p&gt;In short:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Page cache corresponds to file mappings&lt;/li&gt;
&lt;li&gt;Swap cache corresponds to anonymous pages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/cc6be7d9bb51.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The above page cache diagram is from the operating system&amp;rsquo;s perspective. Application (such as database) writes can also be non-delayed, or even bypass Page Cache.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Allocation
 &lt;div id="memory-allocation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-allocation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Memory allocation is also very complex, involving many concepts. Two common memory allocation methods are buddy and slab.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Buddy
 &lt;div id="buddy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#buddy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The buddy system is used for allocating contiguous memory pages. Each zone has its own buddy system. The buddy system divides large blocks of memory to respond to memory allocation requests, and due to its coalescing characteristics, it can reduce system memory fragmentation.&lt;/p&gt;
&lt;p&gt;The buddy allocator divides memory into pages of powers of 2, with the maximum order being 10:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/031369fb6ba0.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;When a memory request is larger than the existing block size, the system splits the larger block into two equally sized buddy blocks. When memory is freed, the system attempts to merge adjacent buddy blocks into a larger block:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2eb819ac1f2d.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;When freeing a page, the page is directly placed back into the free list. If the other half of the previously split page is also unallocated, they are combined into a double-sized page and given to the next larger list, and so on, until it can no longer be merged or has reached the top.&lt;/p&gt;
&lt;p&gt;When higher-order pages are depleted due to continuous allocation, fragmentation issues arise when requesting higher-order pages:



&lt;img src="https://lastdba.com/img/csdn/385e337b4093.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;After waiting for memory reclamation to succeed, buddy itself merges lower orders into higher orders, then allocates higher-order pages:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2452398bd6ce.png" alt="Insert image description" /&gt;
(The implementations of anti pages fragmentation in Linux kernel &lt;a href="https://teawater.github.io/presentation/antif.pdf" target="_blank" rel="noreferrer"&gt;https://teawater.github.io/presentation/antif.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;However, memory reclamation may also not keep up with allocation speed, so the buddy system is not always ideal.&lt;/p&gt;
&lt;p&gt;Analysis example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/buddyinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32 &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;272&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;317681&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38869&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31620&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19250&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8931&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2579&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;815&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;182&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;The above contains 3 ZONEs: DMA, DMA32, Normal&lt;/li&gt;
&lt;li&gt;Orders: 0 ~ 10, i.e., the count of each order in buddy. The maximum order of buddy is 10, i.e., 1024 pages, which is 4MB&lt;/li&gt;
&lt;li&gt;For example, the 3rd column in the Normal row indicates there are 31620 blocks of 2^2 contiguous memory available&lt;/li&gt;
&lt;li&gt;By extension, the further back, the more contiguous the space. The larger the number, the more contiguous space of that size there is. When large contiguous spaces are scarce, it indicates significant memory fragmentation&lt;/li&gt;
&lt;li&gt;Additionally, summing everything up gives the current free memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Judging memory fragmentation issues through buddyinfo:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#host 1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;317681&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38869&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31620&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19250&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8931&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2579&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;815&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;182&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#host 2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;7321&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7833&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10885&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8514&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2311&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1644&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1302&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1141&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7384&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;80675&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The above shows the memory conditions of two hosts. Comparing them, the host below has more contiguous memory, while the host above has memory fragmentation issues.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Slab
 &lt;div id="slab" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slab" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The slab allocator manages memory &lt;strong&gt;based on objects&lt;/strong&gt;. The slab system is a memory allocation algorithm specifically designed for &lt;strong&gt;kernel&lt;/strong&gt; memory. It works by dividing memory into fixed-size caches, where each slab contains a set of objects of the same type. When there is a memory request, the algorithm first checks if available objects exist in the appropriate slab cache. If they exist, the object is returned. If not, the algorithm allocates a new slab and adds it to the appropriate cache.&lt;/p&gt;
&lt;p&gt;Objects of different sizes correspond to different slab caches:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6da00bf9a4b4.png" alt="Insert image description" /&gt;
(&lt;a href="https://bootlin.com/doc/training/linux-kernel/linux-kernel-slides.pdf" target="_blank" rel="noreferrer"&gt;https://bootlin.com/doc/training/linux-kernel/linux-kernel-slides.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Although slab has different caches and objects, slab still uses physically contiguous memory:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8274eb96e6d6.png" alt="Insert image description" /&gt;
(&lt;a href="https://i.stack.imgur.com/wo8Gg.png" target="_blank" rel="noreferrer"&gt;https://i.stack.imgur.com/wo8Gg.png&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Slab also has 3 implementation methods:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6c784e30b08b.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Memory Reclamation
 &lt;div id="memory-reclamation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-reclamation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Recommended article: &lt;a href="https://blog.csdn.net/weixin_35094083/article/details/116688112" target="_blank" rel="noreferrer"&gt;Linux Forced Memory Reclamation, Linux Memory Source Code Analysis - Memory Reclamation&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Memory Reclamation Overview
 &lt;div id="memory-reclamation-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-reclamation-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;When system memory pressure is high, memory reclamation is performed on each zone under pressure. Memory reclamation mainly targets anonymous pages and file pages.&lt;/li&gt;
&lt;li&gt;For anonymous pages, during memory reclamation, some infrequently used anonymous pages are selected, written to the swap partition, and then released as free page frames to the buddy system.&lt;/li&gt;
&lt;li&gt;For file pages, during memory reclamation, some infrequently used file pages are also selected:
&lt;ul&gt;
&lt;li&gt;If the content saved in this file page is consistent with the corresponding file content on disk, this file page is a clean file page and does not need to be written back; it is directly released as a free page frame to the buddy system.&lt;/li&gt;
&lt;li&gt;If the data saved in the file page is inconsistent with the corresponding data in the file on disk, this file page is considered a dirty page. It must first be written back to the corresponding data location on disk, and then released as a free page frame to the buddy system.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;After memory reclamation completes, the number of free page frames in the system increases, alleviating memory pressure. However, the reclamation process puts significant IO pressure on the system. Therefore, a threshold is set for each zone in the system. When the number of free page frames falls below this threshold, memory reclamation operations are performed. When the number of free page frames meets this threshold, the system does not perform memory reclamation operations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Zone Watermarks and kswapd
 &lt;div id="zone-watermarks-and-kswapd" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#zone-watermarks-and-kswapd" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c611d500255d.png" alt="Insert image description" /&gt;
(&lt;a href="https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/" target="_blank" rel="noreferrer"&gt;https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;When available memory is low, the kswapd daemon is awakened to free pages.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;pages_low&lt;/strong&gt;: When the number of available free pages falls below pages_low, the buddy allocator wakes up the &lt;strong&gt;kswapd&lt;/strong&gt; process, and the kernel begins swapping pages out to disk.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pages_min&lt;/strong&gt;: When the number of available pages reaches pages_min, the pressure of page reclamation work is relatively high because the memory zone urgently needs free pages. The allocator will execute kswapd work in a synchronous manner, sometimes called direct reclaim.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pages_high&lt;/strong&gt;: Once kswapd is awakened and begins freeing pages, the kernel considers the zone &amp;ldquo;balanced&amp;rdquo; only when the number of available pages reaches pages_high. If the watermark reaches pages_high, kswapd will re-enter the sleep state. If free pages exceed pages_high, the kernel considers the zone state ideal.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Memory reclamation is performed on a per-zone basis. &lt;code&gt;/proc/zoneinfo&lt;/code&gt; can display the values of min, low, and high.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vm.min_free_kbytes&lt;/code&gt; is the min_pages watermark, a very important OS parameter. Very low values prevent the system from effectively reclaiming memory, potentially leading to system crashes and service interruptions. Too high values increase system reclamation activity, causing allocation delays, which may lead the system to immediately enter an out-of-memory state.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Types of Memory Allocation and Reclamation
 &lt;div id="types-of-memory-allocation-and-reclamation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#types-of-memory-allocation-and-reclamation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Fast Memory Allocation&lt;/strong&gt;: Performed by the get_page_from_freelist() function, which obtains a suitable zone from the zonelist using the low threshold for allocation. If the zone has not reached the low threshold, fast memory reclamation is performed, and allocation is retried after fast memory reclamation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Slow Memory Allocation&lt;/strong&gt;: When fast allocation fails, meaning no zone in the zonelist obtained memory in fast allocation, the min threshold is used for slow allocation. During slow allocation, three main things happen: asynchronous memory compaction, direct memory reclamation, and light synchronous memory compaction. Finally, OOM allocation may occur depending on the situation. And after each of these operations, fast memory allocation is called once to attempt to obtain page frames.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/85d59cc1ec86.png" alt="Insert image description" /&gt;
(&lt;a href="https://blog.csdn.net/weixin_35094083/article/details/116688112" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_35094083/article/details/116688112&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;Different memory allocation paths trigger different memory reclamation methods. Zone memory reclamation is divided into two types:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Background Memory Reclamation&lt;/strong&gt; (kswapd): When physical memory is tight, the kswapd kernel thread is awakened to reclaim memory. This memory reclamation process is &lt;strong&gt;asynchronous&lt;/strong&gt; and does not block process execution.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Direct Memory Reclamation&lt;/strong&gt; (direct reclaim): If background asynchronous reclamation cannot keep up with process memory application speed, direct reclamation begins. This memory reclamation process is &lt;strong&gt;synchronous&lt;/strong&gt; and blocks process execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Memory Compaction
 &lt;div id="memory-compaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-compaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Memory compaction: see Memory Monitoring - /proc/pagetypeinfo section&lt;/p&gt;

&lt;h3 class="relative group"&gt;LRU
 &lt;div id="lru" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lru" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For zone memory reclamation, it targets three things for reclamation: slab, pages in LRU lists, and buffer_head. Here we only discuss memory reclamation targeting LRU lists.&lt;/p&gt;
&lt;p&gt;The main purpose of LRU lists is to sort pages, placing pages most deserving of reclamation at the back and pages least deserving of reclamation at the front. Then, during memory reclamation, scanning proceeds from back to front, attempting to reclaim scanned pages.&lt;/p&gt;
&lt;p&gt;LRU list descriptor, containing 5 LRU lists: active/inactive anonymous page LRU lists, active/inactive file page LRU lists, and unevictable page list:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/958bc46a109a.png" alt="Insert image description" /&gt;
(&lt;a href="https://lpc.events/event/11/contributions/896/attachments/793/1493/slides-r2.pdf" target="_blank" rel="noreferrer"&gt;https://lpc.events/event/11/contributions/896/attachments/793/1493/slides-r2.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;For memory reclamation, it only processes the first 4 LRU lists: active anonymous page LRU list, inactive anonymous page LRU list, active file page LRU list, and inactive file page LRU list. After reclaiming enough page frames, it returns directly: fast memory reclamation and kswapd memory reclamation do this.&lt;/p&gt;
&lt;p&gt;Global lruvec can be viewed through meminfo (understood as LRU areas):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## cat /proc/meminfo |grep -i active&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active: &lt;span style="color:#ae81ff"&gt;597380&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive: &lt;span style="color:#ae81ff"&gt;601920&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;10896&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;117376&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;586484&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;484544&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In reality, there is more than one lruvec. cgroup and NUMA nodes each have their own lruvec, and global also has its own lruvec.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d25a7970acd0.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Drop Cache
 &lt;div id="drop-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#drop-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Drop cache records which pages are caching file system data pages and writes data back to disk when pages are forcibly reclaimed, so they can be cached again on the next access.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Default value: &lt;code&gt;vm.drop_caches = 0&lt;/code&gt;. By default, the Linux kernel does not automatically clear caches.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;/proc/sys/vm/drop_caches&lt;/code&gt; to 1: The kernel clears unused page cache.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;/proc/sys/vm/drop_caches&lt;/code&gt; to 2: The kernel releases memory used by dentry and inode. Dentry and inode are file system metadata structures used to store file and directory information.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting &lt;code&gt;/proc/sys/vm/drop_caches&lt;/code&gt; to 3: Equivalent to 1+2, releases all unused caches.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When the kernel decides to reclaim certain caches, it checks whether the data in the cache is consistent with the data on disk. If the data is inconsistent, the kernel needs to write the data back to disk before reclaiming that cache. This process can cause IO spikes. When performing Drop Cache operations, it is recommended to avoid any important I/O operations as this may affect system performance.&lt;/p&gt;
&lt;p&gt;Operation commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo 3 &amp;gt; /proc/sys/vm/drop_caches # Flush cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;echo 0 &amp;gt; /proc/sys/vm/drop_caches # Restore default&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Memory Monitoring
 &lt;div id="memory-monitoring" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-monitoring" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Without understanding basic memory knowledge, it is actually very difficult to interpret memory monitoring information. With the above memory fundamentals in place, let&amp;rsquo;s go through memory-related monitoring commands and tools one by one.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What&amp;rsquo;s in the /proc Directory?
 &lt;div id="whats-in-the-proc-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#whats-in-the-proc-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;/proc mainly contains process information and system information.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/539736f743ba.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;In the system information part, some are interfaces provided by Linux for system status, allowing you to view monitoring information at the entire operating system level, such as slabinfo, swaps, zoneinfo, buddyinfo.&lt;/p&gt;
&lt;p&gt;The other part, process, contains running data and status information for each process. cd into the corresponding process directory to see the FDs held by the corresponding process and process memory information.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e5f7e542f245.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/de0fd3de265d.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;Processes also have threads. Thread information directory: /proc/[pid]/task/[tid]/, with content similar to the process directory.&lt;/p&gt;
&lt;p&gt;For more proc information, refer to &lt;a href="https://man7.org/linux/man-pages/man5/proc.5.html" target="_blank" rel="noreferrer"&gt;proc(5) — Linux manual page&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;/proc/meminfo
 &lt;div id="procmeminfo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procmeminfo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;/proc/meminfo is the primary interface for understanding the current Linux system memory usage. The most commonly used commands like &lt;code&gt;free&lt;/code&gt;, &lt;code&gt;vmstat&lt;/code&gt;, &lt;code&gt;ps&lt;/code&gt; obtain data through it. /proc/meminfo information is more comprehensive. Below we only list some common information. For detailed meanings, refer to the &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo" target="_blank" rel="noreferrer"&gt;Red Hat documentation&lt;/a&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# General memory information&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep &lt;span style="color:#e6db74"&gt;&amp;#34;Mem&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MemTotal: &lt;span style="color:#ae81ff"&gt;994328&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total memory size (minus some reserved and kernel)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MemFree: &lt;span style="color:#ae81ff"&gt;66428&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Completely unused physical memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MemAvailable: &lt;span style="color:#ae81ff"&gt;207192&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Maximum available memory for starting a new application without using swap space&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# IO buffers&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Buffers&amp;#34;&lt;/span&gt; -we &lt;span style="color:#e6db74"&gt;&amp;#34;Cached&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Buffers: &lt;span style="color:#ae81ff"&gt;12820&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# IO buffers used by raw disk blocks, not exceeding 20MB&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Cached: &lt;span style="color:#ae81ff"&gt;254592&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Page cache size used by disks (includes tmpfs and shmem, excludes SwapCached)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# swap&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep &lt;span style="color:#e6db74"&gt;&amp;#34;Swap&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SwapCached: &lt;span style="color:#ae81ff"&gt;13936&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Swap cache contains anonymous memory pages determined to be swapped but not yet written to physical swap area&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SwapTotal: &lt;span style="color:#ae81ff"&gt;945416&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total swap space size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SwapFree: &lt;span style="color:#ae81ff"&gt;851064&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Remaining swap size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# lru active and inactive page counts (self-explanatory)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Active&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;Inactive&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active: &lt;span style="color:#ae81ff"&gt;194308&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive: &lt;span style="color:#ae81ff"&gt;553172&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;59024&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;anon&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;437264&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Active&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;135284&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Inactive&lt;span style="color:#f92672"&gt;(&lt;/span&gt;file&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;115908&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Dirty pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Dirty&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;Writeback&amp;#34;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Dirty pages not yet written&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Writeback: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Dirty pages being written&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WritebackTmp: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Temporary buffer for writebacks used by the FUSE module&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Map information&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;AnonPages&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;Map&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonPages: &lt;span style="color:#ae81ff"&gt;95296&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped anonymous pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mapped: &lt;span style="color:#ae81ff"&gt;153192&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped file pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DirectMap4k: &lt;span style="color:#ae81ff"&gt;113336&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped 4k kernel pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DirectMap2M: &lt;span style="color:#ae81ff"&gt;1900544&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped 2M kernel pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DirectMap1G: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Mapped 1G kernel pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Shared memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep &lt;span style="color:#e6db74"&gt;&amp;#34;Shmem&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shmem: &lt;span style="color:#ae81ff"&gt;28920&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total memory size of shmem and tmpfs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ShmemHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total huge page memory size of shmem and tmpfs&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ShmemPmdMapped: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Shared memory mapped into userspace with huge pages&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Kernel memory (note: slab is kernel)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -ie &lt;span style="color:#e6db74"&gt;&amp;#34;reclaim&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;slab&amp;#34;&lt;/span&gt; -e &lt;span style="color:#e6db74"&gt;&amp;#34;kernel&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KReclaimable: &lt;span style="color:#ae81ff"&gt;35008&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Reclaimable memory allocated to kernel&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Slab: &lt;span style="color:#ae81ff"&gt;88752&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Slab cache&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SReclaimable: &lt;span style="color:#ae81ff"&gt;35008&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Reclaimable memory in slab cache&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SUnreclaim: &lt;span style="color:#ae81ff"&gt;53744&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Non-reclaimable memory in slab cache&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelStack: &lt;span style="color:#ae81ff"&gt;5988&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Kernel stack memory used by all tasks&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Allocatable memory (different meaning from MemAvailable)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## CommitLimit=[(&amp;#34;total RAM pages&amp;#34; - &amp;#34;total huge TLB pages&amp;#34;) * overcommit_ratio]/100 + &amp;#34;total swap pages&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## In short, MemAvailable watermark plus swap equals allocatable memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -ie &lt;span style="color:#e6db74"&gt;&amp;#34;commit&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CommitLimit: &lt;span style="color:#ae81ff"&gt;1442580&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Allocatable memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Committed_AS: &lt;span style="color:#ae81ff"&gt;3035924&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Estimated memory needed in current worst-case scenario&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Virtual memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -e &lt;span style="color:#e6db74"&gt;&amp;#34;Vmalloc&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmallocTotal: &lt;span style="color:#ae81ff"&gt;34359738367&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total allocated virtual memory size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmallocUsed: &lt;span style="color:#ae81ff"&gt;34780&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Total used virtual memory size&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmallocChunk: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB &lt;span style="color:#75715e"&gt;# Largest contiguous virtual memory block&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Page table memory (self-explanatory)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep PageTables
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PageTables: &lt;span style="color:#ae81ff"&gt;4120&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Huge page memory&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/meminfo | grep -i hugepage
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;32768&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ShmemHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FileHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Total: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Free: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Rsvd: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HugePages_Surp: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Hugepagesize: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;/proc/buddyinfo
 &lt;div id="procbuddyinfo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procbuddyinfo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Due to its concise and easy-to-understand information, buddyinfo is the most commonly used method for judging memory fragmentation issues. See &amp;ldquo;Memory Allocation - Buddy section&amp;rdquo; for details.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/buddyinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone DMA32 &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;272&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal &lt;span style="color:#ae81ff"&gt;317681&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38869&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;31620&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19250&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8931&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2579&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;815&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;182&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;/proc/pagetypeinfo
 &lt;div id="procpagetypeinfo" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procpagetypeinfo" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;pagetypeinfo first provides information about page block sizes. It provides the same type of information as buddyinfo but broken down by type and detailing the number of pages of each type.&lt;/p&gt;
&lt;p&gt;Before understanding pagetypeinfo, you need to first understand &lt;a href="https://lwn.net/Articles/368869/" target="_blank" rel="noreferrer"&gt;memory compaction&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Suppose the memory in a zone looks like this:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/97866485c91f.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;White represents free memory, red represents used memory. The memory fragmentation above is already quite severe. If a request for memory of order 2 or higher is made at this point, it cannot be allocated. This is where memory compaction comes into play. The compaction algorithm marks movable pages and free pages lists on the existing zone.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/706746415abf.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;The movable scanner scans from bottom to top, and the free scanner scans from top to bottom. The movable and free scanners will eventually meet at some point in the middle. Then, through &lt;a href="https://lwn.net/Articles/157066/" target="_blank" rel="noreferrer"&gt;page migration&lt;/a&gt;, used pages are moved to the top of the zone.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8b308995434f.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Two trigger methods for page compaction&lt;/em&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When allocating pages, if allocation fails at the LOW watermark, slow memory allocation is attempted, during which page compaction occurs&lt;/li&gt;
&lt;li&gt;Page compaction can be started with &lt;code&gt;echo x &amp;gt; /proc/sys/vm/compact_memory&lt;/code&gt;. After starting, the kernel thread &lt;code&gt;kcompactd&lt;/code&gt; begins page defragmentation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Because page data is migrated to new locations, there are no performance issues as severe as those caused by memory reclamation. Moreover, since the goal is clearer, the cost of obtaining contiguous pages is lower. Additionally, ANON page reclamation requires SWAP, while this does not.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s look at /proc/pagetypeinfo:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat /proc/pagetypeinfo 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Page block order: &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pages per block: &lt;span style="color:#ae81ff"&gt;512&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;... &lt;span style="color:#f92672"&gt;(&lt;/span&gt;DMA omitted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Unmovable &lt;span style="color:#ae81ff"&gt;870&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;530&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;391&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;157&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;103&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Movable &lt;span style="color:#ae81ff"&gt;5886&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9235&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5728&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4072&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1561&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;324&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;115&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13018&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Reclaimable &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type HighAtomic &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type CMA &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Node 0, zone Normal, type Isolate &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Different pages are classified as pageblocks. Each pageblock is divided into several lists based on its type. When allocating memory, pages are requested from the corresponding list based on the requested page type, and when freed, they return to the corresponding list based on their pageblock. Different pageblocks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Unmovable: Pages that cannot be compacted&lt;/li&gt;
&lt;li&gt;Movable: Pages that can be compacted&lt;/li&gt;
&lt;li&gt;Reclaimable: Pages that can be reclaimed&lt;/li&gt;
&lt;li&gt;HighAtomic: Pageblock added to mitigate fragmentation issues. Only higher-order and same-level requests can request pages from this pageblock&lt;/li&gt;
&lt;li&gt;CMA: CMA stands for Contiguous Memory Allocator&lt;/li&gt;
&lt;li&gt;Isolate: Pages will not be allocated; used to help isolate pages. When isolating pages, pageblocks are first set to isolate to prevent them from being freed&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;CMA appears to be another large topic, which can be simply understood as a supplement to the buddy system:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a9cd9dcbfe80.png" alt="Insert image description" /&gt;
(Memory Journey — How to Improve CMA Utilization? &lt;a href="https://ost.51cto.com/posts/10815" target="_blank" rel="noreferrer"&gt;https://ost.51cto.com/posts/10815&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;smaps &amp;amp; maps &amp;amp; pmap
 &lt;div id="smaps--maps--pmap" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#smaps--maps--pmap" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;VSS/RSS/PSS/USS&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When viewing the memory occupied by a process, there are commonly four forms: VSS/RSS/PSS/USS, mainly differing in memory calculation methodology.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/21b9b29f53a3.png" alt="Insert image description" /&gt;
(&lt;a href="https://cloud.tencent.com/developer/article/1683708" target="_blank" rel="noreferrer"&gt;https://cloud.tencent.com/developer/article/1683708&lt;/a&gt;)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VSS (Virtual Set Size) is just a virtual space size, with little significance for actual memory usage.&lt;/li&gt;
&lt;li&gt;RSS (Resident Set Size) is used for calculating the total memory occupied by a process, including shared memory size occupied by shared libraries. For example, if private memory size is N and shared memory size is M, then RSS = N + M. This can be misleading, because for large shared libraries like libc, shared by many processes, counting it all against one process is not scientific.&lt;/li&gt;
&lt;li&gt;PSS (Proportional Set Size) is the actual physical memory occupied by a single process when running, including proportionally allocated shared library memory. If a shared library is used by N processes, the size proportionally allocated to PSS is 1/N. PSS calculates process memory more accurately, including exclusive memory plus the shared portion.&lt;/li&gt;
&lt;li&gt;USS (Unique Set Size) is the physical memory exclusively occupied by a process, not including shared memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;/proc/[pid]/maps&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;/proc/[pid]/maps can view the &lt;strong&gt;user space&lt;/strong&gt; memory mappings of the &lt;strong&gt;process&amp;rsquo;s&lt;/strong&gt; &lt;strong&gt;virtual memory&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat maps 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;StartAddr-EndAddr Perms Offset Dev Inode Filename
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00400000-00bae000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1093852&lt;/span&gt; /pg/pg15.3/bin/postgres ---text segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00dad000-00dc3000 rw-p 007ad000 fd:00 &lt;span style="color:#ae81ff"&gt;1093852&lt;/span&gt; /pg/pg15.3/bin/postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00dc3000-00df5000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00f1e000-00f60000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt; ---heap area
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;33a6000000-33a6022000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1976006&lt;/span&gt; /lib64/ld-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe2ae09000-7fbe2ae0a000 rw-p 0000c000 fd:00 &lt;span style="color:#ae81ff"&gt;1975966&lt;/span&gt; /lib64/libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe2ae1b000-7fbe33ca7000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;12556&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe33ca7000-7fbe39b38000 r--p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1181300&lt;/span&gt; /usr/lib/locale/locale-archive
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b38000-7fbe39b3d000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b46000-7fbe39b4d000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:10 &lt;span style="color:#ae81ff"&gt;12559&lt;/span&gt; /dev/shm/PostgreSQL.3661351388
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b4d000-7fbe39b4e000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;32769&lt;/span&gt; /SYSV0010c0b6 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe39b4e000-7fbe39b4f000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fffe3933000-7fffe3948000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;stack&lt;span style="color:#f92672"&gt;]&lt;/span&gt; --stack area
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fffe397d000-7fffe397e000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;vdso&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000-ffffffffff601000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;vsyscall&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(1) Start-End Address: The address range of this segment in virtual memory
(2) Permissions: Permissions of this segment; r-read, w-write, x-execute, p-private
(3) Offset: The offset of this segment mapping in the file
(4) Device: The device number of the device where the mapped file resides, corresponding to vm_file-&amp;gt;f_dentry-&amp;gt;d_inode-&amp;gt;i_sb-&amp;gt;s_dev. &lt;strong&gt;Anonymous mappings have 0. fd is the major device number, 00 is the minor device number.&lt;/strong&gt;
(5) Inode: Corresponds to vm_file-&amp;gt;f_dentry-&amp;gt;d_inode-&amp;gt;i_ino, &lt;strong&gt;matches the content displayed by ls -i, anonymous mappings have 0.&lt;/strong&gt;
(6) Mapped File Name: For named mappings, it&amp;rsquo;s the mapped file name. For anonymous mappings, it&amp;rsquo;s the role of this memory segment in the process.&lt;/p&gt;
&lt;p&gt;Below is an analysis by Wenxin (it actually analyzed it correctly, this is a PostgreSQL postmaster process):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/639e6e130d4f.png" alt="Insert image description" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;/proc/[pid]/smaps&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The /proc/[pid]/smaps file is an extension based on /proc/[pid]/maps, providing more detailed information than the maps file in the same directory. Each VMA has the following series of data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat smaps 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00400000-00bae000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:00 &lt;span style="color:#ae81ff"&gt;1093852&lt;/span&gt; /pg/pg15.3/bin/postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; kB --VSS memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;408&lt;/span&gt; kB --RSS memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;140&lt;/span&gt; kB --PSS memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;404&lt;/span&gt; kB --Shared, clean memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Shared, dirty &lt;span style="color:#f92672"&gt;(&lt;/span&gt;i.e., modified&lt;span style="color:#f92672"&gt;)&lt;/span&gt; memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB --Private, clean memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Private, dirty memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;408&lt;/span&gt; kB --Current page marked as referenced or containing anonymous mappings
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Anonymous pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Anonymous huge pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB --Swapped-out memory size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB --Kernel page size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB --Page table page size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fffe3933000-7fffe3948000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;stack&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now we know that maps are the process&amp;rsquo;s memory mapping information, and smaps also includes the memory size of each mapping segment (VSS, RSS, PSS).&lt;/p&gt;
&lt;p&gt;You can calculate a process&amp;rsquo;s memory usage by looking at PSS, RSS, etc. data in process smaps. Note the unit is KB.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Total physical memory usage of all processes&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;grep Pss /proc/&lt;span style="color:#f92672"&gt;[&lt;/span&gt;1-9&lt;span style="color:#f92672"&gt;]&lt;/span&gt;*/smaps | awk &lt;span style="color:#e6db74"&gt;&amp;#39;{total+=$2}; END {printf &amp;#34;%d kB\n&amp;#34;, total }&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;PSS memory of a specific process&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;RSS memory of a specific process&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Private memory of a specific process&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/90875/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;pmap&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The pmap command parses the /proc/[pid]/maps and /proc/[pid]/smaps files. It has few parameters; -x means show more information.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# pmap -x 2345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2345: /pg/pg15.3/bin/postgres -D /pg/1503data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;212&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dad000 &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; rw--- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dc3000 &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000f1e000 &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00000033a6000000 &lt;span style="color:#ae81ff"&gt;136&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- ld-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe2ae09000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe2ae1b000 &lt;span style="color:#ae81ff"&gt;145968&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4396&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4396&lt;/span&gt; rw-s- zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe33ca7000 &lt;span style="color:#ae81ff"&gt;96836&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r---- locale-archive
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b38000 &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b46000 &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; rw-s- PostgreSQL.3661351388
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw-s- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x8001 &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4e000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe3933000 &lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; stack &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe397d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------- ------ ------ ------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total kB &lt;span style="color:#ae81ff"&gt;268896&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5532&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4540&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The pmap output format is similar to /proc/[pid]/maps, with one line per VMA address, but includes VSS and RSS in addition to maps, allowing you to directly see the size used by each region of the process&amp;rsquo;s virtual memory, helping to quickly determine where the regions with more memory are.&lt;/p&gt;
&lt;p&gt;If the [heap] in the address space is too large, it might be a heap memory leak. For another example, if the process address space contains too many VMAs (each line in maps can be understood as a VMA), it&amp;rsquo;s likely that the application called many mmaps without munmap. Or, continuously observing changes in the address space — if certain entries are continuously growing, there&amp;rsquo;s likely an issue there.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;Analysis Example&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;From the host&amp;rsquo;s TOP memory view, a certain PostgreSQL backend process memory appears relatively high. Further analysis of map information is needed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;68729&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5579004&lt;/span&gt; 5.116g 5.114g R 97.4 1.4 128:27.94 postgres: lzl: lzldb lzl 30.78.14.174&lt;span style="color:#f92672"&gt;(&lt;/span&gt;58067&lt;span style="color:#f92672"&gt;)&lt;/span&gt; DELETE &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Check this process&amp;rsquo;s Rss, Pss, Uss:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;5422.67 ---5.4G Rss
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;467.957 ---467mb Pss
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cat /proc/68729/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;179.605 ---179mb Uss&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Rss-Uss=5.3G of shared memory. From Pss-Uss=290mb of proportional shared memory, we can roughly see that this backend is only a small portion of this shared memory proportion.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ pmap -x &lt;span style="color:#ae81ff"&gt;68729&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;68729: postgres: pdmp: pdmpdata pdmp 30.78.14.174&lt;span style="color:#f92672"&gt;(&lt;/span&gt;46252&lt;span style="color:#f92672"&gt;)&lt;/span&gt; DELETE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6084&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2444&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000bf0000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; r---- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000bf1000 &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; rw--- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b7f65bfa000 &lt;span style="color:#ae81ff"&gt;5441216&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; rw-s- zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt; --this part takes the most
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1daa000 &lt;span style="color:#ae81ff"&gt;48&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1db6000 &lt;span style="color:#ae81ff"&gt;2044&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; ----- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1fb5000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; r---- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1fb6000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; rw--- libnss_files-2.17.so
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80b1fb7000 &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00002b80ba001000 &lt;span style="color:#ae81ff"&gt;516&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;516&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;516&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe16f7000 &lt;span style="color:#ae81ff"&gt;132&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; stack &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe175b000 &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Diving deeper into smap analysis, we can directly locate the zero (deleted) part:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat smaps 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00400000-009f1000 r-xp &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; fd:06 &lt;span style="color:#ae81ff"&gt;58726481&lt;/span&gt; /paic/postgres/base/9.6.6/bin/postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2b7f65bfa000-2b80b1daa000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;72254&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;5441216&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;264618&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;5365444&lt;/span&gt; kB --shared dirty data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;5364764&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Locked: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmFlags: rd wr sh mr mw me ms sd &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the above analysis, we can conclude: this is a PostgreSQL private process that has modified a large amount of data without flushing dirty pages. Its own private memory is not much; most is occupied in shared memory. This is likely a transaction in PostgreSQL that has modified a lot of data but hasn&amp;rsquo;t committed yet.&lt;/p&gt;
&lt;p&gt;Additionally, /dev/zero (deleted) is explained in &lt;a href="https://www.man7.org/linux/man-pages/man5/proc.5.html" target="_blank" rel="noreferrer"&gt;proc(5) — Linux manual page&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Although these entries are present for memory regions that were mapped with the MAP_FILE flag, the way anonymous shared memory (regions created with the MAP_ANON | MAP_SHARED flags) is implemented in Linux means that such regions also appear on this directory. Here is an example where the target file is the deleted /dev/zero one:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; lrw-------. 1 root root 64 Apr 16 21:33
 7fc075d2f000-7fc075e6f000 -&amp;gt; /dev/zero (deleted)
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;&lt;p&gt;&amp;ldquo;Unofficial translation&amp;rdquo;: Anonymous pages and shared pages are represented by /dev/zero (deleted).&lt;/p&gt;

&lt;h3 class="relative group"&gt;/proc/[pid]/status
 &lt;div id="procpidstatus" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procpidstatus" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;status can view process state information, including some memory information.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 2345&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# cat status &lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Name: postgres ---the command running this thread
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;State: S &lt;span style="color:#f92672"&gt;(&lt;/span&gt;sleeping&lt;span style="color:#f92672"&gt;)&lt;/span&gt; ---process state
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Tgid: &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; ---Thread group ID &lt;span style="color:#f92672"&gt;(&lt;/span&gt;i.e., Process ID&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pid: &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; ---Thread ID
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PPid: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ---PID of parent process.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmPeak: &lt;span style="color:#ae81ff"&gt;268964&lt;/span&gt; kB ---virtual memory peak
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmSize: &lt;span style="color:#ae81ff"&gt;268896&lt;/span&gt; kB ---virtual memory current
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmLck: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmHWM: &lt;span style="color:#ae81ff"&gt;13400&lt;/span&gt; kB ---RSS peak
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmRSS: &lt;span style="color:#ae81ff"&gt;5532&lt;/span&gt; kB ---RSS current
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmData: &lt;span style="color:#ae81ff"&gt;528&lt;/span&gt; kB ---data segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmStk: &lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; kB ---stack segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmExe: &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; kB ---text segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmLib: &lt;span style="color:#ae81ff"&gt;3100&lt;/span&gt; kB ---shared library code segment
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmPTE: &lt;span style="color:#ae81ff"&gt;136&lt;/span&gt; kB ---Page table entries
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VmSwap: &lt;span style="color:#ae81ff"&gt;308&lt;/span&gt; kB ---swap size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Threads: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ---number of threads in this process
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;....&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compared to maps, status has no mapping information. The memory data is more summarized, allowing for a more intuitive view of the size occupied by each segment of virtual memory.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;View processes with the most SWAP usage&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; file in /proc/*/status ; &lt;span style="color:#66d9ef"&gt;do&lt;/span&gt; awk &lt;span style="color:#e6db74"&gt;&amp;#39;/VmSwap|Name|^Pid/{printf $2 &amp;#34; &amp;#34; $3}END{ print &amp;#34;&amp;#34;}&amp;#39;&lt;/span&gt; $file; &lt;span style="color:#66d9ef"&gt;done&lt;/span&gt; | sort -k &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; -n -r | head&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;cgroup memory
 &lt;div id="cgroup-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cgroup-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt" target="_blank" rel="noreferrer"&gt;cgroup memory control&lt;/a&gt; is now very common. Some host parameters need to be set in cgroup. Memory settings and monitoring information are under /sys/fs/cgroup/memory/.&lt;/p&gt;
&lt;p&gt;cginfo to view CGROUP memory allocation and usage: /opt/cgtools/cginfo -t perf -s mem&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;cginfo -t perf -s mem
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;====================&lt;/span&gt; Cgroup Performance: memory &lt;span style="color:#f92672"&gt;====================&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DB_TYPE INSTANCE_NAME MEM_OOM MEM_FILE_GB MEM_MAP_GB MEM_USED_GB MEM_ALLO_GB ALLO_RATE MEM_GLOB_GB GLOB_RATE 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------- ------------- ------- ----------- ---------- ----------- ----------- --------- ----------- --------- 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres LZLDB &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 154.3 0.0 4.2 160.0 2.6% &lt;span style="color:#ae81ff"&gt;375&lt;/span&gt; 1.1% &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View relatively detailed CGROUP memory usage status: /sys/fs/cgroup/memory/[group]/memory.stat&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cat memory.stat 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_cache &lt;span style="color:#ae81ff"&gt;167791534080&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_rss &lt;span style="color:#ae81ff"&gt;4006932480&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_rss_huge &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_mapped_file &lt;span style="color:#ae81ff"&gt;11747328&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_swap &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgpgin &lt;span style="color:#ae81ff"&gt;792754417976&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgpgout &lt;span style="color:#ae81ff"&gt;792712474991&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgfault &lt;span style="color:#ae81ff"&gt;477971874868&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_pgmajfault &lt;span style="color:#ae81ff"&gt;97318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_inactive_anon &lt;span style="color:#ae81ff"&gt;1610874880&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_active_anon &lt;span style="color:#ae81ff"&gt;2408255488&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_inactive_file &lt;span style="color:#ae81ff"&gt;73446166528&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_active_file &lt;span style="color:#ae81ff"&gt;94332768256&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total_unevictable &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;smem
 &lt;div id="smem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#smem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://linux.die.net/man/8/smem" target="_blank" rel="noreferrer"&gt;smem&lt;/a&gt; is a powerful tool for displaying memory usage. It reads information from smaps, meminfo, etc. under /proc and outputs summaries. smem can output overall and specific map memory conditions, which is very intuitive and can be analyzed from different dimensions. Overall, it&amp;rsquo;s a very useful tool for analyzing memory usage.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://selenic.com/repo/smem" target="_blank" rel="noreferrer"&gt;repo&lt;/a&gt; can be downloaded directly. Basically, just extract and use it. For more usage, refer to &lt;a href="https://www.selenic.com/smem/" target="_blank" rel="noreferrer"&gt;smem memory reporting tool&lt;/a&gt;. Below are just simple examples:&lt;/p&gt;
&lt;p&gt;View system memory usage &lt;code&gt;-w&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -w -k&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Area Used Cache Noncache 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;firmware/hardware &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kernel image &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;kernel dynamic memory 183.9M 84.0M 99.9M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;userspace memory 112.3M 62.2M 50.1M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;free memory 700.3M 700.3M &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View memory consumption per user &lt;code&gt;-u&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -s pss -urk&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;User Count Swap USS PSS RSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oracle &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt; 85.2M 30.8M 95.7M 383.0M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;root &lt;span style="color:#ae81ff"&gt;93&lt;/span&gt; 112.4M 38.5M 42.3M 86.2M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; 5.9M 1.6M 2.5M 5.9M 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;mysql &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; 169.7M 1.7M 1.7M 2.0M &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View memory consumption for a specific user &lt;code&gt;-U&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -U pg -k&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID User Command Swap USS PSS RSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; pg /pg/pg15.3/bin/postgres -D 364.0K 124.0K 134.0K 228.0K 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2352&lt;/span&gt; pg postgres: logical replicati 636.0K 144.0K 161.0K 196.0K 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Filter a specific process &lt;code&gt;-P&lt;/code&gt; (PROCESSFILTER, not pid):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root@lzl ~]# smem -P postgres -p
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID User Command Swap USS PSS RSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2346 pg /pg/pg16.0/bin/postgres -D 0.01% 0.01% 0.01% 0.01% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2350 pg postgres: walwriter 0.01% 0.01% 0.01% 0.01% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View process mapping and memory usage &lt;code&gt;-m&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# smem -P postgres -mpr -s pss&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Map PIDs AVGPSS PSS 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;lt;anonymous&amp;gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; 0.02% 0.24% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 0.07% 0.20% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/usr/lib64/libpython2.6.so.1.0 &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; 0.11% 0.11% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/pg/pg15.3/bin/postgres &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 0.01% 0.06% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/pg/pg16.0/bin/postgres &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; 0.01% 0.06% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;/dev/zero &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; 0.00% 0.03% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;stack&lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; 0.00% 0.02% 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;smem is very intuitive for viewing process USS\PSS\RSS. However, there is one issue: smem cannot filter by pid, only by username or PROCESSFILTER. When a host has multiple database instances deployed, filtering by parent PID or child PID is not very friendly.&lt;/p&gt;

&lt;h3 class="relative group"&gt;top
 &lt;div id="top" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#top" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/top.1.html" target="_blank" rel="noreferrer"&gt;top&lt;/a&gt; can display system running status in real time. top can be quite fancy in its usage. Running top directly can also display a lot of information.&lt;/p&gt;
&lt;p&gt;Sorting in top:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;command sorted-field supported
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M %MEM Yes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;N PID Yes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;P %CPU Yes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;T TIME+ Yes&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can use %MEM to sort processes with higher memory usage. %MEM represents the RES memory percentage.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;top - 23:38:01 up &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; days, 22:32, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; users, load average: 1.12, 1.42, 1.09
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Tasks: &lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; total, &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; running, &lt;span style="color:#ae81ff"&gt;183&lt;/span&gt; sleeping, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; stopped, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; zombie
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Cpu&lt;span style="color:#f92672"&gt;(&lt;/span&gt;s&lt;span style="color:#f92672"&gt;)&lt;/span&gt;: 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mem: 1020348k total, 325848k used, 694500k free, 1352k buffers
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: 4128760k total, 635872k used, 3492888k free, 150288k cached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;18537&lt;/span&gt; oracle &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 636m 24m 21m S 0.0 2.4 0:05.41 oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;18533&lt;/span&gt; oracle &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 638m 24m 21m S 0.0 2.4 0:02.01 oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;18509&lt;/span&gt; oracle &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 634m &lt;span style="color:#ae81ff"&gt;4384&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4036&lt;/span&gt; S 0.0 0.4 0:01.93 oracle
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2639&lt;/span&gt; root &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 729m &lt;span style="color:#ae81ff"&gt;4052&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1444&lt;/span&gt; S 0.0 0.4 8:45.32 nautilus &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Memory-related interpretation:&lt;/p&gt;
&lt;p&gt;Line 4: Memory usage information: physical memory amount, used memory, free memory, buffer memory
Line 5: Swap partition information: available swap total, used swap total, free swap total, kernel cached amount&lt;/p&gt;
&lt;p&gt;Line 6 (memory-related):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VIRT: VSS&lt;/li&gt;
&lt;li&gt;RES: RSS (likely), anything occupying physical memory&lt;/li&gt;
&lt;li&gt;SHR: Shared Memory Size. It will include shared anonymous pages and shared file-backed pages&lt;/li&gt;
&lt;li&gt;%MEM: RSS percentage, a task&amp;rsquo;s currently resident share of available physical memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additionally, don&amp;rsquo;t forget to look at the process status when checking memory.&lt;/p&gt;
&lt;p&gt;S (example column 8) Process Status:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;D = uninterruptible sleep. Indicates the process is waiting for an external event to complete, such as disk I/O operations or network requests. Usually, D processes cannot be directly terminated.&lt;/li&gt;
&lt;li&gt;I = idle&lt;/li&gt;
&lt;li&gt;R = running&lt;/li&gt;
&lt;li&gt;S = sleeping&lt;/li&gt;
&lt;li&gt;T = stopped by job control signal&lt;/li&gt;
&lt;li&gt;t = stopped by debugger during trace&lt;/li&gt;
&lt;li&gt;Z = zombie&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The top command can see the host&amp;rsquo;s memory summary information. Process memory usage information includes RSS and SHR. A rough calculation of RES-SHR=USS can also calculate the private memory usage size. Additionally, you can see process status, so top -p to view basic memory information for a specific process is very useful.&lt;/p&gt;

&lt;h3 class="relative group"&gt;free
 &lt;div id="free" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#free" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/free.1.html" target="_blank" rel="noreferrer"&gt;free&lt;/a&gt; displays the host&amp;rsquo;s swap, total and remaining memory, all parsed from /proc/meminfo.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;user@ubuntu:~$ free
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; total used free shared buff/cache available
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Mem: 8029356 794336 6297928 183384 937092 6816804
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: 0 0 0&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;total: Total usable memory (MemTotal and SwapTotal in /proc/meminfo). This includes the physical and swap memory minus a few reserved bits and kernel binary code.&lt;/li&gt;
&lt;li&gt;used: Used or unavailable memory (calculated as total - available)&lt;/li&gt;
&lt;li&gt;free: Unused memory (MemFree and SwapFree in /proc/meminfo) shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)&lt;/li&gt;
&lt;li&gt;buffers: Memory used by kernel buffers (Buffers in /proc/meminfo)&lt;/li&gt;
&lt;li&gt;cache: Memory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo). Not just pagecache, but also SReclaimable slab!&lt;/li&gt;
&lt;li&gt;buff/cache: Sum of buffers and cache&lt;/li&gt;
&lt;li&gt;available: cache includes pagecache and SReclaimable, free includes mem free and swap free; while available includes pagecache and memory about to be reclaimed. Indicates available memory, but their calculation methods differ. In practical applications, due to cache existence, available is usually larger than free.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Page Cache:
Page cache is primarily used as a cache for file data on the file system, especially when processes have read/write operations on files.&lt;/p&gt;
&lt;p&gt;Buffer Cache:
Buffer cache is primarily designed for caching blocks when the system reads/writes block devices.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ps aux
 &lt;div id="ps-aux" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ps-aux" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The biggest advantage of ps is analyzing process status (including memory) from the process perspective. Processes with [ ] flags in the COMMAND are kernel processes.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg@lzl ~]$ ps aux|head -1;ps aux|grep postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2345 0.0 0.0 268896 236 ? Ss Jan01 0:03 /pg/pg15.3/bin/postgres -D /pg/1503data
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2353 0.0 0.0 269040 196 ? Ss Jan01 0:00 postgres: checkpointer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2354 0.0 0.0 269032 160 ? Ss Jan01 0:02 postgres: background writer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2356 0.0 0.0 269032 116 ? Ss Jan01 0:01 postgres: walwriter
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2357 0.0 0.0 270508 824 ? Ss Jan01 0:02 postgres: autovacuum launcher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 2358 0.0 0.0 270492 620 ? Ss Jan01 0:00 postgres: logical replication launcher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg 29818 0.0 0.0 103372 868 pts/0 S+ 09:16 0:00 grep postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;VSZ and RSS units are KB. Memory information is limited; VSZ has little value, RSS can be referenced, but there&amp;rsquo;s no PSS or USS type information, so not much can be analyzed.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ipcs
 &lt;div id="ipcs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ipcs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;ipcs -m&lt;/code&gt; is a command for querying IPC (Interprocess Communication) shared memory resources. It&amp;rsquo;s quite useful when analyzing shared memory.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ipcs -m
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------ Shared Memory Segments --------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;key shmid owner perms bytes nattch status 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0x0010c0b6 &lt;span style="color:#ae81ff"&gt;32769&lt;/span&gt; pg &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Shared memory key value&lt;/li&gt;
&lt;li&gt;Shared memory ID (shmid)&lt;/li&gt;
&lt;li&gt;User who created this shared memory&lt;/li&gt;
&lt;li&gt;Permissions (perms)&lt;/li&gt;
&lt;li&gt;Created size (bytes)&lt;/li&gt;
&lt;li&gt;Number of processes attached to this shared memory (nattach)&lt;/li&gt;
&lt;li&gt;Shared memory status&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When connecting a session to PostgreSQL, one more backend process appears:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------ Shared Memory Segments --------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;key shmid owner perms bytes nattch status 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0x0010c0b6 &lt;span style="color:#ae81ff"&gt;32769&lt;/span&gt; pg &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;nattch+1, indicating that the private backend process also shares a portion of the PG shared memory. At this point, the following diagram is understood more deeply:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/75c502689001.png" alt="Insert image description" /&gt;
(&lt;a href="http://gauss.ececs.uc.edu/Courses/c4029/code/memory/virtual.pdf" target="_blank" rel="noreferrer"&gt;http://gauss.ececs.uc.edu/Courses/c4029/code/memory/virtual.pdf&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;vmstat
 &lt;div id="vmstat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vmstat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man8/vmstat.8.html" target="_blank" rel="noreferrer"&gt;vmstat&lt;/a&gt; is an abbreviation for Virtual Memory Statistics, and can monitor the operating system&amp;rsquo;s virtual memory, processes, and CPU activity. It provides statistics on the overall system situation; the shortcoming is that it cannot perform in-depth analysis of a specific process.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Useful&lt;/em&gt; parameter explanations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;vmstat &lt;span style="color:#f92672"&gt;[&lt;/span&gt;options&lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;delay &lt;span style="color:#f92672"&gt;[&lt;/span&gt;count&lt;span style="color:#f92672"&gt;]]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;OPTIONS:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-a Display active and inactive memory
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-m Display slabinfo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-s Display memory-related statistics and various system activity counts
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-t Append timestamp to each line
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-w Wide output mode. Without w, the output is narrow, reducing alignment issues&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-bash-4.1$ vmstat -w &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r b swpd free buff cache si so bi bo in cs us sy id wa st
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;661652&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;763348&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;324&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;76100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;54&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;45&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;79&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;661652&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;763340&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;304&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;75764&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;661652&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;760744&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;244&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;78300&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;228&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3216&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;265&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;442&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/95f93eae0f6e.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;pidstat
 &lt;div id="pidstat" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pidstat" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/pidstat.1.html" target="_blank" rel="noreferrer"&gt;pidstat&lt;/a&gt; is a command from the sysstat tool, used to monitor all or specified processes&amp;rsquo; CPU, memory, threads, device IO, and other system resource usage.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Useful&lt;/em&gt; parameter explanations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pidstat OPTIONS interval &lt;span style="color:#f92672"&gt;[&lt;/span&gt; count &lt;span style="color:#f92672"&gt;]&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-d :Report I/O statistics 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-u :Report CPU utilization
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r :Report page faults and memory utilization
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-w :Report task switching activity
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-p :pid&lt;span style="color:#f92672"&gt;[&lt;/span&gt;,...&lt;span style="color:#f92672"&gt;]&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-l :Display the process command name and all its arguments.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View memory status of a specific process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-bash-4.1$ pidstat -r -l -p &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Linux 2.6.32-431.el6.x86_64 &lt;span style="color:#f92672"&gt;(&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;)&lt;/span&gt; 01/06/2024 _x86_64_ &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; CPU&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;02:48:32 PM PID minflt/s majflt/s VSZ RSS %MEM Command
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;02:48:32 PM &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; 0.23 0.00 &lt;span style="color:#ae81ff"&gt;268896&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240&lt;/span&gt; 0.02 /pg/pg15.3/bin/postgres -D /pg/1503data &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Various indicators are relatively easy to understand. VSZ, RSS — tired of talking about them.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;minflt/s: Abbreviation for &amp;ldquo;minor page faults&amp;rdquo;, indicating the number of &amp;ldquo;minor page faults&amp;rdquo; that occur per second. A page fault occurs when a program tries to access a page that is not in physical memory. If the page is indeed in the swap area on disk, this is a minor page fault.&lt;/li&gt;
&lt;li&gt;majflt/s: Abbreviation for &amp;ldquo;major page faults&amp;rdquo;, indicating the number of &amp;ldquo;major page faults&amp;rdquo; that occur per second. Unlike minor page faults, major page faults occur when a program tries to access a page that is not in physical memory and is also not in the swap area on disk.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;sar
 &lt;div id="sar" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sar" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/sar.1.html" target="_blank" rel="noreferrer"&gt;sar&lt;/a&gt; (System Activity Reporter) is currently one of the most comprehensive system performance analysis tools on Linux. It can report on various aspects of system activity, including: file read/write status, system call usage, disk I/O, CPU efficiency, memory usage, process activity, and IPC-related activity. The SAR tool is part of the sysstat software package.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/886494f9c001.png" alt="Insert image description" /&gt;
(&lt;a href="https://www.brendangregg.com/Perf/linux_observability_sar.png" target="_blank" rel="noreferrer"&gt;https://www.brendangregg.com/Perf/linux_observability_sar.png&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;sar is very powerful. The man parameter introduction alone has over 1k lines. This article cannot possibly explain everything (being lazy).&lt;/p&gt;
&lt;p&gt;Memory-related parameters:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sar OPTIONS interval &lt;span style="color:#f92672"&gt;[&lt;/span&gt; count &lt;span style="color:#f92672"&gt;]&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-B :Report paging statistics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-r :Report memory utilization statistics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-W :Report swapping statistics.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-H :Report hugepages utilization statistics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-s &lt;span style="color:#f92672"&gt;[&lt;/span&gt; start_time &lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt; -e &lt;span style="color:#f92672"&gt;[&lt;/span&gt; end_time &lt;span style="color:#f92672"&gt;]&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Example: sar view memory utilization
&lt;code&gt;sar -r 1 3&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;kbmemfree: This value is basically consistent with the free value in the free command, so it does not include buffer and cache space&lt;/li&gt;
&lt;li&gt;kbmemused: This value is basically consistent with the used value in the free command, so it includes buffer and cache space&lt;/li&gt;
&lt;li&gt;%memused: This value is kbmemused as a percentage of total memory (excluding swap)&lt;/li&gt;
&lt;li&gt;kbbuffers: buffer in the free command&lt;/li&gt;
&lt;li&gt;kbcached: cache in the free command&lt;/li&gt;
&lt;li&gt;kbcommit: Memory needed to guarantee the current system, i.e., memory needed to ensure no overflow (RAM + swap)&lt;/li&gt;
&lt;li&gt;%commit: This value is kbcommit as a percentage of total memory (including swap)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar view memory page status
&lt;code&gt;sar -B 1 3&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pgpgin/s: Kilobytes paged in from disk or SWAP to memory per second&lt;/li&gt;
&lt;li&gt;pgpgout/s: Kilobytes paged out from memory to disk or SWAP per second&lt;/li&gt;
&lt;li&gt;fault/s: Number of page faults per second, i.e., sum of major and minor faults&lt;/li&gt;
&lt;li&gt;majflt/s: Number of major faults per second&lt;/li&gt;
&lt;li&gt;pgfree/s: Number of pages placed on the free queue per second&lt;/li&gt;
&lt;li&gt;pgscank/s: Number of pages scanned by kswapd per second&lt;/li&gt;
&lt;li&gt;pgscand/s: Number of pages directly scanned per second&lt;/li&gt;
&lt;li&gt;pgsteal/s: Number of pages reclaimed from cache to meet memory needs per second&lt;/li&gt;
&lt;li&gt;%vmeff: Pages stolen (pgsteal) as a percentage of total scanned pages (pgscank + pgscand) per second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar view swap information
&lt;code&gt;sar -W 1 3&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Report explanation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pswpin/s: Number of swap pages swapped in per second&lt;/li&gt;
&lt;li&gt;pswpout/s: Number of swap pages swapped out per second&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar view historical memory information
&lt;code&gt;sar -B -s &amp;quot;08:00:00&amp;quot; -e &amp;quot;10:00:00&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#Without -e, it shows information from the start time to now&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ sar -B -s &lt;span style="color:#e6db74"&gt;&amp;#34;08:00:00&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:45:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:46:01 PM 414429.37 395024.08 179478.63 0.07 352922.62 12003.78 4266.52 16269.42 99.99
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:47:01 PM 879907.08 337948.43 157970.97 0.02 402290.21 0.00 0.00 0.00 0.00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:48:01 PM 772977.43 507343.30 150255.50 0.05 466742.08 0.00 5821.28 5821.27 100.00&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Above, pgscank represents the speed at which the kswapd process intervenes in memory reclamation, and pgscand represents the speed of direct memory reclamation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;gcore
 &lt;div id="gcore" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gcore" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://man7.org/linux/man-pages/man1/gcore.1.html" target="_blank" rel="noreferrer"&gt;gcore&lt;/a&gt; is part of gdb and can generate a core dump file for a process.&lt;/p&gt;
&lt;p&gt;Example: dump a PostgreSQL backend process:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; ps -ef|grep &lt;span style="color:#ae81ff"&gt;8296&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg &lt;span style="color:#ae81ff"&gt;8296&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2345&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 09:41 ? 00:00:00 postgres: pg lzldb &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; cat /proc/8296/smaps |grep Pss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.351562
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; cat /proc/8296/smaps |grep Rss |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.445312
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl ~&lt;span style="color:#f92672"&gt;]&lt;/span&gt; cat /proc/8296/smaps|sed &lt;span style="color:#e6db74"&gt;&amp;#39;/zero/,/VmFlags/d&amp;#39;&lt;/span&gt; |grep Private |awk &lt;span style="color:#e6db74"&gt;&amp;#39;{sum+=$2 };END {print sum/1024}&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0.0078125&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Process 8296&amp;rsquo;s USS is only 7.8 KB, RSS 445 KB. Dump memory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gcore -o /tmp/dump 8296&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Dumping takes some time, and the dumped file is relatively large, and it will hang the process.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[root&lt;span style="color:#960050;background-color:#1e0010"&gt;@&lt;/span&gt;lzl &lt;span style="color:#ae81ff"&gt;8296&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;#&lt;/span&gt; ls &lt;span style="color:#f92672"&gt;-&lt;/span&gt;lh &lt;span style="color:#f92672"&gt;/&lt;/span&gt;tmp&lt;span style="color:#f92672"&gt;/&lt;/span&gt;dump&lt;span style="color:#ae81ff"&gt;.8296&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#f92672"&gt;-&lt;/span&gt;r&lt;span style="color:#f92672"&gt;--&lt;/span&gt;r&lt;span style="color:#f92672"&gt;--&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; root root &lt;span style="color:#ae81ff"&gt;252&lt;/span&gt;M Jan &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt;tmp&lt;span style="color:#f92672"&gt;/&lt;/span&gt;dump&lt;span style="color:#ae81ff"&gt;.8296&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;gdb
 &lt;div id="gdb" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gdb" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://sourceware.org/gdb/current/onlinedocs/gdb" target="_blank" rel="noreferrer"&gt;gdb&lt;/a&gt; can view specific locations and content in memory.&lt;/p&gt;
&lt;p&gt;Example: view PostgreSQL backend cached data:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Open a new session to query a partitioned table, keeping the session open:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl &lt;span style="color:#f92672"&gt;~&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; psql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;psql (&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;help&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; help.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pg&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appl_no &lt;span style="color:#f92672"&gt;|&lt;/span&gt; is_deleted &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_updated 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+------------+--------------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Use pmap, smaps to view process memory usage and find the memory segment to dump:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 13393&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# pmap -x 13393&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;13393: postgres: pg lzldb &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; idle 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7864&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1204&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe2ae1b000 &lt;span style="color:#ae81ff"&gt;145968&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;176&lt;/span&gt; rw-s- zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt; ---RSS takes the most here
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe33ca7000 &lt;span style="color:#ae81ff"&gt;96836&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r---- locale-archive
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b38000 &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b46000 &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw-s- PostgreSQL.3661351388
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw-s- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; shmid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;0x8001 &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fbe39b4e000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe3933000 &lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; stack &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;00007fffe397d000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 13393&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# cat /proc/13393/smaps |grep -A 13 zero&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;7fbe2ae1b000-7fbe33ca7000 rw-s &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:04 &lt;span style="color:#ae81ff"&gt;12556&lt;/span&gt; /dev/zero &lt;span style="color:#f92672"&gt;(&lt;/span&gt;deleted&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;145968&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;1988&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;176&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;2164&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;gdb dump memory:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The starting position for dumping memory is the vm address in smaps + &lt;code&gt;0x&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl tmp&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ gdb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; attach &lt;span style="color:#ae81ff"&gt;13393&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; dump memory /tmp/delete.dump 0x7fbe2ae1b000 0x7fbe33ca7000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="4"&gt;
&lt;li&gt;View the dump file:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;You can simply view it through strings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;root@lzl 13393&lt;span style="color:#f92672"&gt;]&lt;/span&gt;&lt;span style="color:#75715e"&gt;# strings /tmp/delete.dump|grep lzl|sort|uniq&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; @lzlpartition_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202301_appl_no_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202301_date_created_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202306_appl_no_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_202306_date_created_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; @lzlpartition_attach
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzlpartition_attach
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; @nk_lzlpartition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;nk_lzlpartition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; * from lzlpartition limit 1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As long as the session queries a partitioned table, all partition and index metadata is cached in the backend process.&lt;/p&gt;
&lt;p&gt;Note:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;gdb attach [pid] will hang the process; do not execute casually&lt;/li&gt;
&lt;li&gt;The dump file size equals VSS, generally much larger than RSS/PSS/USS&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Memory Summary
 &lt;div id="memory-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/be2156c8a394.png" alt="Insert image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Easily Break Through File I/O Bottlenecks: Memory-Mapped mmap Technology &lt;a href="https://blog.51cto.com/u_15481245/6582927" target="_blank" rel="noreferrer"&gt;https://blog.51cto.com/u_15481245/6582927&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Step by Step with Diagrams: Deep Understanding of Linux Physical Memory Management &lt;a href="https://cloud.tencent.com/developer/article/2352771?areaId=106001" target="_blank" rel="noreferrer"&gt;https://cloud.tencent.com/developer/article/2352771?areaId=106001&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Systematically Learning Memory Management from a DBA&amp;rsquo;s Perspective &lt;a href="https://mp.weixin.qq.com/s/CybzGP44dVWQN5hfFrVx7A" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/CybzGP44dVWQN5hfFrVx7A&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://linux2me.wordpress.com/2017/09/15/linux-introduction-to-memory-management/" target="_blank" rel="noreferrer"&gt;https://linux2me.wordpress.com/2017/09/15/linux-introduction-to-memory-management/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Memory management in Linux &lt;a href="https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/raghusiddarth/memory-management-in-linux-11551521?from_search=2&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux Performance Tunning Memory &lt;a href="https://www.slideshare.net/shayc1/linux-performance-tunning-memory?from_search=4" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/shayc1/linux-performance-tunning-memory?from_search=4&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;How to Learn the Linux Kernel (Memory Chapter) &lt;a href="https://mp.weixin.qq.com/s/lKKHH1MMiZbnIbDQt3-IAQ" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/lKKHH1MMiZbnIbDQt3-IAQ&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf" target="_blank" rel="noreferrer"&gt;https://courses.engr.illinois.edu/cs241/sp2014/lecture/09-VirtualMemory_II_sol.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux Process Virtual Address Space &lt;a href="https://maodanp.github.io/2019/06/02/linux-virtual-space/" target="_blank" rel="noreferrer"&gt;https://maodanp.github.io/2019/06/02/linux-virtual-space/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Red Hat Official Documentation &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-virtualization_tuning_optimization_guide-numa" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/virtualization_tuning_and_optimization_guide/chap-virtualization_tuning_optimization_guide-numa&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Data Processing on Modern Hardware &lt;a href="https://db.in.tum.de/teaching/ss21/dataprocessingonmodernhardware/MH_8.pdf?lang=de" target="_blank" rel="noreferrer"&gt;https://db.in.tum.de/teaching/ss21/dataprocessingonmodernhardware/MH_8.pdf?lang=de&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Chapter 2 Describing Physical Memory &lt;a href="https://www.kernel.org/doc/gorman/html/understand/understand005.html" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/gorman/html/understand/understand005.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Various command man pages&lt;/p&gt;
&lt;p&gt;Linux Forced Memory Reclamation, Linux Memory Source Code Analysis - Memory Reclamation (Overall Process) &lt;a href="https://blog.csdn.net/weixin_35094083/article/details/116688112" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_35094083/article/details/116688112&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&amp;lt;Memory compaction &lt;a href="https://lwn.net/Articles/368869/%3E" target="_blank" rel="noreferrer"&gt;https://lwn.net/Articles/368869/&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Memory Journey — How to Improve CMA Utilization? &lt;a href="https://ost.51cto.com/posts/10815" target="_blank" rel="noreferrer"&gt;https://ost.51cto.com/posts/10815&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The implementations of anti pages fragmentation in Linux kernel &lt;a href="https://teawater.github.io/presentation/antif.pdf" target="_blank" rel="noreferrer"&gt;https://teawater.github.io/presentation/antif.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;T H E /proc F I L E S Y S T E M &lt;a href="https://www.kernel.org/doc/Documentation/filesystems/proc.txt" target="_blank" rel="noreferrer"&gt;https://www.kernel.org/doc/Documentation/filesystems/proc.txt&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The /proc/meminfo File in Linux &lt;a href="https://www.baeldung.com/linux/proc-meminfo" target="_blank" rel="noreferrer"&gt;https://www.baeldung.com/linux/proc-meminfo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;the proc filesystem &lt;a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo" target="_blank" rel="noreferrer"&gt;https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-proc-meminfo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Introduction and Usage of Linux /proc/{pid}/maps (Locating Memory Leaks) &lt;a href="https://blog.csdn.net/mijichui2153/article/details/123934531" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/mijichui2153/article/details/123934531&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;CPU and Memory Usage in Linux top Command &lt;a href="https://blog.csdn.net/weixin_45030965/article/details/127693042" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_45030965/article/details/127693042&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;smem memory reporting tool &lt;a href="https://www.selenic.com/smem/" target="_blank" rel="noreferrer"&gt;https://www.selenic.com/smem/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux performance optimization &lt;a href="https://feiyang233.club/post/linux/" target="_blank" rel="noreferrer"&gt;https://feiyang233.club/post/linux/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;gdb onlinedocs &lt;a href="https://sourceware.org/gdb/current/onlinedocs/gdb" target="_blank" rel="noreferrer"&gt;https://sourceware.org/gdb/current/onlinedocs/gdb&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Linux_Core_Dumps &lt;a href="https://averageradical.github.io/Linux_Core_Dumps.pdf" target="_blank" rel="noreferrer"&gt;https://averageradical.github.io/Linux_Core_Dumps.pdf&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Brief Analysis of PostgreSQL FDW</title><link>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-fdw/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-fdw/</guid><description>&lt;h2 class="relative group"&gt;FDW Basic Concepts
 &lt;div id="fdw-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What is SQL/MED?
 &lt;div id="what-is-sqlmed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-sqlmed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;SQL/MED aims to unify access methods for heterogeneous data sources. In 2003, SQL/MED was added to the ISO/IEC 9075-9 standard, defined as a SQL standard extension for &lt;strong&gt;managing external data&lt;/strong&gt; via foreign-data wrappers (FDW) or datalink (such as Oracle or PG&amp;rsquo;s dblink). In short, SQL/MED is an international SQL extension standard. Many databases already support SQL/MED, such as DB2, MariaDB, PG, and more.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;FDW Basic Concepts
 &lt;div id="fdw-basic-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-basic-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What is SQL/MED?
 &lt;div id="what-is-sqlmed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-sqlmed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;SQL/MED aims to unify access methods for heterogeneous data sources. In 2003, SQL/MED was added to the ISO/IEC 9075-9 standard, defined as a SQL standard extension for &lt;strong&gt;managing external data&lt;/strong&gt; via foreign-data wrappers (FDW) or datalink (such as Oracle or PG&amp;rsquo;s dblink). In short, SQL/MED is an international SQL extension standard. Many databases already support SQL/MED, such as DB2, MariaDB, PG, and more.&lt;/p&gt;
&lt;p&gt;Without SQL/MED, applications must access required data sources themselves and process data at the application layer:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4d2dae15ed42.png" alt="1" /&gt;&lt;/p&gt;
&lt;p&gt;With SQL/MED, the data access architecture becomes clearer:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ab659ea2f77d.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;p&gt;However, while this architecture diagram appears simpler, it increases the database&amp;rsquo;s IO and computation pressure. This goes against the modern trend of decoupling computation from the database to the application layer.&lt;/p&gt;
&lt;p&gt;Of course, both approaches have their pros and cons, and SQL/MED is still used in certain scenarios.&lt;/p&gt;
&lt;p&gt;SQL/MED exists as a standard, and PostgreSQL supports the SQL/MED standard excellently through FDW.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What is FDW?
 &lt;div id="what-is-fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0c0845d79809.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has supported FDW since version 9.1. Users can access external data (foreign data) through regular SQL statements. Foreign data is accessed via a foreign data wrapper (FDW). The FDW in PostgreSQL is itself a library — because different external data sources correspond to different FDW extensions, we often call it an FDW plugin.&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s FDW functionality is extremely powerful: it not only supports multiple data sources but also optimizes data access, and can even be used for &amp;ldquo;beyond expectations&amp;rdquo; purposes, such as implementing cluster functionality.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Installation and Download
 &lt;div id="installation-and-download" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#installation-and-download" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Basically every type of database and data format has its own FDW plugin: oracle_fdw for Oracle databases, mysql_fdw for MySQL databases, and so on. FDW plugins can be installed directly or downloaded:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;FDWs already included as extensions: file_fdw, postgres_fdw, cstore_fdw&lt;/li&gt;
&lt;li&gt;Other FDW plugins can be downloaded from PGXN or the wiki, such as: oracle_fdw, mysql_fdw, json_fdw. Be sure to read the README carefully to understand each FDW&amp;rsquo;s limitations and usage rules.&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;FDW plugin download: &lt;a href="https://pgxn.org/tag/fdw/" target="_blank" rel="noreferrer"&gt;https://pgxn.org/tag/fdw/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;More FDWs (mostly beta): &lt;a href="https://wiki.postgresql.org/wiki/Foreign_data_wrappers" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Foreign_data_wrappers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Write your own FDW: &lt;a href="https://www.postgresql.org/docs/current/fdwhandler.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/fdwhandler.html&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Advantages of FDW over dblink in PG
 &lt;div id="advantages-of-fdw-over-dblink-in-pg" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#advantages-of-fdw-over-dblink-in-pg" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PG also has dblink. FDW and dblink are functionally similar — both access external tables. But FDW has more advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;FDW supports many more data sources (a LOT more). dblink only supports PostgreSQL databases, equivalent to just one FDW plugin — postgres_fdw (which is actually much more powerful).&lt;/li&gt;
&lt;li&gt;Transparent to developers. External tables can be accessed just like regular tables.&lt;/li&gt;
&lt;li&gt;More compliant with standard SQL syntax.&lt;/li&gt;
&lt;li&gt;Better performance in many scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;The functionality provided by this module overlaps substantially with the functionality of the older &lt;a href="https://www.postgresql.org/docs/15/dblink.html" title="F.12. dblink" target="_blank" rel="noreferrer"&gt;dblink&lt;/a&gt; module. But &lt;code&gt;postgres_fdw&lt;/code&gt; provides more transparent and standards-compliant syntax for accessing remote tables, and can give better performance in many cases.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In summary, FDW is stronger than the dblink plugin — you can basically forget about dblink.&lt;/p&gt;

&lt;h2 class="relative group"&gt;FDW&amp;rsquo;s Four Objects
 &lt;div id="fdws-four-objects" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdws-four-objects" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Different FDWs have different usage patterns, but generally all require creating 4 objects: &lt;strong&gt;foreign data wrapper&lt;/strong&gt;, &lt;strong&gt;server&lt;/strong&gt;, &lt;strong&gt;user mapping&lt;/strong&gt;, &lt;strong&gt;foreign table&lt;/strong&gt;. Some objects are not mandatory — for example, file_fdw doesn&amp;rsquo;t need a user mapping, while relational database FDWs generally require one.&lt;/p&gt;

&lt;h3 class="relative group"&gt;foreign data wrapper
 &lt;div id="foreign-data-wrapper" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#foreign-data-wrapper" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After creating the corresponding FDW extension with CREATE EXTENSION, the foreign data wrapper is automatically created.&lt;/p&gt;
&lt;p&gt;For example, creating a file_fdw extension:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; extension file_fdw;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; EXTENSION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Version&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------+---------+------------+------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; file_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;foreign&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; wrapper &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; flat file &lt;span style="color:#66d9ef"&gt;access&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.foreign_data_wrappers;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; foreign_data_wrapper_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; authorization_identifier &lt;span style="color:#f92672"&gt;|&lt;/span&gt; library_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_language
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------+---------------------------+--------------------------+--------------+-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; file_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can also create a foreign data wrapper manually without using an extension. See &lt;a href="https://www.postgresql.org/docs/13/sql-createforeigndatawrapper.html" target="_blank" rel="noreferrer"&gt;CREATE FOREIGN DATA WRAPPER&lt;/a&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;server
 &lt;div id="server" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#server" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CREATE SERVER creates an external service, essentially specifying the data source. The OPTIONS syntax varies by foreign-data wrapper — for example, the OPTION syntax for file_fdw and postgres_fdw is definitely different. At this point, you need to read the FDW plugin&amp;rsquo;s README or official documentation. For example:&lt;/p&gt;
&lt;p&gt;Create a file_fdw external service named fileserver:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER fileserver &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER file_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Create a postgres_fdw external service named pgserver, pointing to the lzldb database on a PG instance at 172.0.0.1:5432:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER pgserver &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER postgres_fdw &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;host&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;172.0.0.1&amp;#39;&lt;/span&gt;, dbname &lt;span style="color:#e6db74"&gt;&amp;#39;lzldb&amp;#39;&lt;/span&gt;, port &lt;span style="color:#e6db74"&gt;&amp;#39;5432&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View servers:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.foreign_servers;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; foreign_server_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_data_wrapper_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_version &lt;span style="color:#f92672"&gt;|&lt;/span&gt; authorization_identifier
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------+---------------------+------------------------------+---------------------------+---------------------+------------------------+--------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pgserver &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; fileserver &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; file_fdw &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;user mapping
 &lt;div id="user-mapping" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#user-mapping" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;User mapping defines the correspondence between external service users and local users. Therefore, relational database FDWs generally have user mappings, while file-type FDWs without user definitions don&amp;rsquo;t need them.&lt;/p&gt;
&lt;p&gt;For example, create a user mapping using the pgserver from above:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USER&lt;/span&gt; MAPPING &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; localuser SERVER pgserver &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;remoteuser&amp;#39;&lt;/span&gt;, password &lt;span style="color:#e6db74"&gt;&amp;#39;mypasswd&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View user mappings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.user_mappings;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; authorization_identifier &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------+------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; localuser &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pgserver&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;foreign table
 &lt;div id="foreign-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#foreign-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Foreign tables map remote tables locally, allowing them to be accessed like regular tables. Since local objects are involved and there are many OPTIONS, the full syntax is somewhat complex. See &lt;a href="https://www.postgresql.org/docs/current/sql-createforeigntable.html" target="_blank" rel="noreferrer"&gt;CREATE FOREIGN TABLE&lt;/a&gt;. Simply put, you create a locally corresponding remote table.&lt;/p&gt;
&lt;p&gt;Two common ways to create foreign tables: creation and import.&lt;/p&gt;
&lt;p&gt;Create a foreign table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; localtable (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id char(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name varchar(&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SERVER pgserver &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;remotetable&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Creating foreign tables one by one is tedious — you can import all tables from a remote schema at once:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;IMPORT &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; remoteschema &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; SERVER pgserver &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; localschema;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View foreign tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; information_schema.foreign_tables; &lt;span style="color:#75715e"&gt;-- Intuitive view of foreign tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_foreign_server; &lt;span style="color:#75715e"&gt;-- Less intuitive, but shows OPTION settings&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Using FDW
 &lt;div id="using-fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Viewing Foreign Table Information
 &lt;div id="viewing-foreign-table-information" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#viewing-foreign-table-information" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;psql&amp;rsquo;s built-in shortcuts are quite clear for viewing the 4 objects of foreign tables, but pay attention to search_path settings:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;psql command&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;\des&lt;/td&gt;
 &lt;td&gt;list foreign servers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;\deu&lt;/td&gt;
 &lt;td&gt;list user mappings&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;\det&lt;/td&gt;
 &lt;td&gt;list foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;\dtE&lt;/td&gt;
 &lt;td&gt;list both local and foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Foreign table object views/tables can be messy — here&amp;rsquo;s a quick organization:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;foreign data wrapper tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_data_wrappers&lt;/td&gt;
 &lt;td&gt;More complete information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_data_wrappers&lt;/td&gt;
 &lt;td&gt;Less information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_data_wrapper_options&lt;/td&gt;
 &lt;td&gt;Targeted query of foreign data wrapper options&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_foreign_data_wrapper&lt;/td&gt;
 &lt;td&gt;Slightly less info, but has permission info that other views lack&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;foreign server tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_servers&lt;/td&gt;
 &lt;td&gt;More complete information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_servers&lt;/td&gt;
 &lt;td&gt;Less information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_server_options&lt;/td&gt;
 &lt;td&gt;Targeted option query — one record per option, not per server&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_foreign_server&lt;/td&gt;
 &lt;td&gt;Less information, base table&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;user mapping tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_user_mappings&lt;/td&gt;
 &lt;td&gt;Fairly complete user mapping information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.user_mappings&lt;/td&gt;
 &lt;td&gt;Less information&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.user_mapping_options&lt;/td&gt;
 &lt;td&gt;Targeted query of UM options&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_user_mappings&lt;/td&gt;
 &lt;td&gt;Slightly less than _pg_user_mappings. Viewable by unprivileged users — passwords show as null&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;pg_user_mapping&lt;/td&gt;
 &lt;td&gt;Less information, base table, mainly options. Inaccessible to unprivileged users&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;foreign table tables/views&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_tables&lt;/td&gt;
 &lt;td&gt;More complete, shows all foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema._pg_foreign_table_columns&lt;/td&gt;
 &lt;td&gt;Shows column-to-column mappings&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;information_schema.foreign_table_options&lt;/td&gt;
 &lt;td&gt;Targeted display of foreign table options&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;foreign_tables&lt;/td&gt;
 &lt;td&gt;Less information, base table&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These views/tables look messy but actually have a clear structure. The 4 object types all follow the same data dictionary pattern:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6805aee46c58.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_xxx are base tables, the foundational information source for the 4 objects&lt;/li&gt;
&lt;li&gt;information_schema._pg_xxx joins pg_xxx base tables with other info — it&amp;rsquo;s a summary view with comprehensive information&lt;/li&gt;
&lt;li&gt;information_schema.xxx is a view on information_schema._pg_xxx, with less information&lt;/li&gt;
&lt;li&gt;information_schema.xxx_options provides targeted option information, sourced only from the full view information_schema._pg_xxx&lt;/li&gt;
&lt;li&gt;A special view: pg_user_mappings, usable even by unprivileged users&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Permission Considerations
 &lt;div id="permission-considerations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#permission-considerations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If you use the postgres superuser throughout to create foreign tables, you&amp;rsquo;ll rarely encounter issues. But in production, application users are typically not superusers. Therefore, permissions are extremely important — not only important but also quite troublesome. Using a regular user for testing is crucial (as with any testing). PG&amp;rsquo;s permission system is like a boss battle — missing any link won&amp;rsquo;t work.&lt;/p&gt;
&lt;p&gt;Key permission points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Foreign data wrapper, server, and user mapping owners are their creators. Users must be granted USAGE privilege or be the owner themselves to use them.&lt;/li&gt;
&lt;li&gt;Accessing remote data sources requires users with appropriate permissions — specified in the user mapping step with suitable remote login credentials.&lt;/li&gt;
&lt;li&gt;After creating/importing foreign tables locally, these objects are treated as local objects (only the data dictionary is local). So PG&amp;rsquo;s local object access permission system must also be properly configured.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;FDW Usage Examples
 &lt;div id="fdw-usage-examples" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-usage-examples" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are hundreds of FDW implementations for various data sources worldwide — relational databases, NoSQL databases, various file types, Web Services, columnar storage, big data, and more. Here are a few common FDWs.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Using postgres_fdw
 &lt;div id="using-postgres_fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-postgres_fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;This is probably the most commonly used and most powerful FDW. It allows accessing external PostgreSQL databases from a local database. It can also be used for self-access — this is important because: &lt;strong&gt;PostgreSQL cannot access across databases internally!&lt;/strong&gt; To solve this problem, a good approach is using FDW for cross-database access within the same instance — accessing yourself through an external connection.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s an example of cross-database access using postgres_fdw:&lt;/p&gt;
&lt;p&gt;An instance has two databases: aka and bkb. You can&amp;rsquo;t query both databases in a single SQL statement — databases in PG are logically isolated, somewhat like Oracle 12c PDBs.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[lzl&lt;span style="color:#f92672"&gt;@&lt;/span&gt;postgres]&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; aka &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt;Tc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;CTc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bkb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.UTF&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt;Tc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres&lt;span style="color:#f92672"&gt;=&lt;/span&gt;CTc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although both databases are local, when using FDW we still need the local/remote database concept. Here we treat aka as the local database and bkb as the remote database, enabling access to bkb&amp;rsquo;s tables from aka while handling permission issues.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Install FDW plugin&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; extension postgres_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note: Extensions are database-level — switch to the local database first.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Grant user permissions&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;usage&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;foreign&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; wrapper postgres_fdw &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; akadata;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Create server&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka akadata
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER bkb_server &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER postgres_fdw &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;host&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;127.0.0.1&amp;#39;&lt;/span&gt;, port &lt;span style="color:#e6db74"&gt;&amp;#39;5432&amp;#39;&lt;/span&gt;, dbname &lt;span style="color:#e6db74"&gt;&amp;#39;bkb&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;4. Create user mapping&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USER&lt;/span&gt; MAPPING &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; akadata SERVER bkb_server &lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;bkbdata&amp;#39;&lt;/span&gt;, password &lt;span style="color:#e6db74"&gt;&amp;#39;bkbpasswd&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;5. Create schema in aka database, grant to akadata user&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; bkb;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;usage&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; bkb &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; akadata;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--GRANT select ON ALL TABLES IN SCHEMA bkb TO akadata;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; bkb &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; akadata;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;6. Import bkb tables&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; aka akadata&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Import entire schema:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;IMPORT &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; SERVER bkb_server &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; bkb;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Import a single table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; IMPORT &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (tab1) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; SERVER bkb_server &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; bkb&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;7. View foreign tables&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.foreign_tables;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; foreign_table_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_table_schema &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_table_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_catalog &lt;span style="color:#f92672"&gt;|&lt;/span&gt; foreign_server_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------+----------------------+-------------------------------------+------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; aka &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bkb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tab1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; aka &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bkb_server&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Using file_fdw
 &lt;div id="using-file_fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-file_fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The file_fdw extension provides PG with read-only access to external files. file_fdw is already in contrib and can be installed with &lt;code&gt;CREATE EXTENSION&lt;/code&gt;. External files must conform to COPY rules.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a classic example of mapping PG output logs to a foreign table, script from the &lt;a href="https://www.postgresql.org/docs/current/file-fdw.html" target="_blank" rel="noreferrer"&gt;official documentation&lt;/a&gt;:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Create file_fdw extension&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; EXTENSION file_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. Create external server&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; SERVER fileserver &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DATA&lt;/span&gt; WRAPPER file_fdw;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Create foreign table&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; pglog (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_time &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; user_name text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; database_name text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; process_id integer,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; connection_from text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; session_id text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; session_line_num bigint,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; command_tag text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; session_start_time &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtual_transaction_id text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transaction_id bigint,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; error_severity text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; sql_state_code text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; message text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; detail text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; hint text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; internal_query text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; internal_query_pos integer,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; context text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; query text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; query_pos integer,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;location&lt;/span&gt; text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; application_name text
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) SERVER fileserver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;OPTIONS&lt;/span&gt; ( filename &lt;span style="color:#e6db74"&gt;&amp;#39;pg_log/postgresql-07-06.csv&amp;#39;&lt;/span&gt;, format &lt;span style="color:#e6db74"&gt;&amp;#39;csv&amp;#39;&lt;/span&gt; );&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;4. Query the log table&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; user_name,database_name,process_id,error_severity,message &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pglog &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; error_severity&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;LOG&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; user_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; database_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; process_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; error_severity &lt;span style="color:#f92672"&gt;|&lt;/span&gt; message
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+---------------+------------+----------------+-----------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appuser1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; db1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;102349&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ERROR &lt;span style="color:#f92672"&gt;|&lt;/span&gt; value too long &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appuser1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; db1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;55378&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ERROR &lt;span style="color:#f92672"&gt;|&lt;/span&gt; value too long &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appuser2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; db2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;219377&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ERROR &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#e6db74"&gt;&amp;#34;dual&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; exist&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Deep Dive into postgres_fdw
 &lt;div id="deep-dive-into-postgres_fdw" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#deep-dive-into-postgres_fdw" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;postgres_fdw Performance Optimization
 &lt;div id="postgres_fdw-performance-optimization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgres_fdw-performance-optimization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Unlike most FDW plugins, postgres_fdw is an official plugin maintained by the PostgreSQL Global Development Group, with its source code in contrib. Because external services differ in functionality and structure, some features — such as obtaining remote database access costs or aggregate pushdown in certain scenarios — are difficult to implement in other FDWs. But in postgres_fdw they&amp;rsquo;re achievable. The official team has done extensive optimization for postgres_fdw, making it extremely powerful.&lt;/p&gt;

&lt;h4 class="relative group"&gt;SQL Execution Process
 &lt;div id="sql-execution-process" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-execution-process" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2d6d90fc0f63.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The parser generates a query tree from the foreign table definition.&lt;/li&gt;
&lt;li&gt;The planner connects to the foreign server.&lt;/li&gt;
&lt;li&gt;Obtain cost information. If &lt;code&gt;use_remote_estimate&lt;/code&gt; is true (default), the planner executes EXPLAIN on the remote database to get access costs (step 3); if false, it calculates locally instead.&lt;/li&gt;
&lt;li&gt;Deparse generates remote SQL text. &lt;strong&gt;FDW accesses remote database objects by sending SQL text&lt;/strong&gt; — the planner generates SQL text for remote execution. The &lt;code&gt;Remote SQL&lt;/code&gt; part of the execution plan directly shows the deparsed SQL:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;86&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; ((a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Send SQL statement and receive data. The remote database executes the SQL independently and returns results to the local database based on fetch_size (default 100 rows).&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;Cost Estimation
 &lt;div id="cost-estimation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cost-estimation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;postgres_fdw can pass remote database object access costs to the local database for calculating the overall SQL execution plan cost. However, simply returning the remote estimated cost isn&amp;rsquo;t enough — the cost of remote access itself must also be considered. postgres_fdw provides 3 OPTIONS to adjust foreign table cost estimation:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;use_remote_estimate&lt;/strong&gt;: When set to true, the planner runs EXPLAIN on the remote database to get estimated costs, adding fdw_startup_cost and fdw_tuple_cost. When false (default), the planner calculates locally and adds fdw_startup_cost and fdw_tuple_cost. Local foreign table statistics may differ from actual values.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fdw_startup_cost&lt;/strong&gt;: Startup cost for foreign tables, default 100. Represents the cost of establishing a connection, parsing, and generating a plan on the external service.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fdw_tuple_cost&lt;/strong&gt;: Additional cost per tuple scanned from a foreign table, default 0.01. Represents data transfer cost — higher latency should mean higher settings.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Aggregate Pushdown
 &lt;div id="aggregate-pushdown" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#aggregate-pushdown" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Aggregate pushdown executes computations on the remote database, with the local database directly receiving the remote execution results. Without aggregate pushdown, all data must be returned to the local database for computation, increasing data transfer&amp;rsquo;s impact on SQL execution efficiency and the local database&amp;rsquo;s computational burden.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(In this environment, bkb.&lt;/em&gt; are all foreign tables, local tables are public.&lt;em&gt;)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Predicate Pushdown&lt;/strong&gt;: postgres_fdw supports WHERE pushdown — no need to return all data to the local database.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; f1.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 f1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; ((a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Sort Pushdown&lt;/strong&gt;: postgres_fdw supports sort pushdown, sending sorts to the remote database.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1 &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt; nulls &lt;span style="color:#66d9ef"&gt;first&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 f1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; NULLS &lt;span style="color:#66d9ef"&gt;FIRST&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Join Pushdown&lt;/strong&gt;: Some joins cannot be pushed down, like local table JOIN foreign table — only the foreign table results can be brought locally for joining.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a,l2.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1,tab1 l2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; f1.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;l2.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a, l2.a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (l2.a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; f1.a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 l2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: l2.a, l2.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1 f1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When both tables are foreign tables, joins can be pushed down to the remote database:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; f1.a,f1.b &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 f1 &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; bkb.tab2 f2 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; f1.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;f2.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: f1.a, f1.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Relations: (bkb.tab1 f1) &lt;span style="color:#66d9ef"&gt;LEFT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; (bkb.tab2 f2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; r1.a, r1.b &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 r1 &lt;span style="color:#66d9ef"&gt;LEFT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab2 r2 &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; (((r1.a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; r2.a))))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Aggregate Function Pushdown&lt;/strong&gt;: Supports pushing down aggregate functions — functions must be &lt;code&gt;IMMUTABLE&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; b,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;avg&lt;/span&gt;(a) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; b;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; GroupAggregate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: b, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;avg&lt;/span&gt;(a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: tab1.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a, b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; a, b &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;ASC&lt;/span&gt; NULLS &lt;span style="color:#66d9ef"&gt;LAST&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Some scenarios aren&amp;rsquo;t supported, such as HAVING clauses that can only filter locally:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;,costs &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; b,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;having&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; GroupAggregate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: b, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: tab1.b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Foreign&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; bkb.tab1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: a, b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Remote &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.tab1 &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;ASC&lt;/span&gt; NULLS &lt;span style="color:#66d9ef"&gt;LAST&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Other Features
 &lt;div id="other-features" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-features" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Remote Execution OPTION Settings
 &lt;div id="remote-execution-option-settings" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#remote-execution-option-settings" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;extensions&lt;/strong&gt;: User-specified FDW extensions that can use &amp;ldquo;remote computation&amp;rdquo;. Can only be set at the server level.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;fetch_size&lt;/strong&gt;: Number of rows fetched per batch from the remote database, default 100. Can be set at server or table level.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;updatable&lt;/strong&gt;: By default, postgres_fdw foreign tables are updatable. The updatable option can control this. If a foreign table is inherently non-updatable, setting updatable to false at the table level causes errors directly locally.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;truncatable&lt;/strong&gt;: Starting from PG14, postgres_fdw supports truncating foreign tables, controlled by the &lt;code&gt;truncatable&lt;/code&gt; option, defaulting to true.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Connection Management
 &lt;div id="connection-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#connection-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;On the first foreign table access in a session, a connection to the remote database is established. As long as the local session hasn&amp;rsquo;t disconnected, this connection is reused. If multiple user mappings are used, a connection is established for each user mapping.&lt;/p&gt;
&lt;p&gt;Starting from PG14, the &lt;code&gt;keep_connections&lt;/code&gt; option controls this behavior. Defaults to on, meaning the session can reuse this connection later; when off, the connection is closed at transaction end.&lt;/p&gt;
&lt;p&gt;PG14+: &lt;code&gt;postgres_fdw_get_connections()&lt;/code&gt; can view connection status.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Transaction Management
 &lt;div id="transaction-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Important FDW transaction characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The remote database executes SQL based on the text sent by the local database.&lt;/li&gt;
&lt;li&gt;When the local database has SERIALIZABLE isolation level, the remote also uses SERIALIZABLE; otherwise, the remote uses REPEATABLE READ.&lt;/li&gt;
&lt;li&gt;When the local transaction commits or rolls back, the remote transaction also commits or rolls back.&lt;/li&gt;
&lt;li&gt;FDW does not support 2PC transactions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Without distributed 2PC transaction support, partial commits may occur. For example, even if a remote update fails, the local update can still complete:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;123&amp;#39;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42703&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;c&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; exist
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LINE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; bkb.tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;123&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;No Distributed Lock Management
 &lt;div id="no-distributed-lock-management" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#no-distributed-lock-management" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;FDW has no distributed lock management, hence no distributed deadlock detection mechanism.&lt;/p&gt;
&lt;p&gt;Deadlock detection works for local tables but not for foreign tables.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Asynchronous Execution
 &lt;div id="asynchronous-execution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#asynchronous-execution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Starting from PG14, postgres_fdw supports asynchronous execution. When there are multiple Append nodes in the execution plan, they can execute in parallel, improving performance when accessing multiple foreign tables.&lt;/p&gt;
&lt;p&gt;Asynchronous execution only occurs with multiple sessions — i.e., multiple user mappings. The &lt;code&gt;async_capable&lt;/code&gt; option controls this, defaulting to false. The &lt;code&gt;enable_async_append&lt;/code&gt; parameter must also be enabled (default on).&lt;/p&gt;

&lt;h4 class="relative group"&gt;Parallel Commit
 &lt;div id="parallel-commit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#parallel-commit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Starting from PG15, postgres_fdw supports parallel commit. Remote transactions commit alongside local transactions. Without parallel commit/rollback, PG can only commit/rollback remote transactions serially.&lt;/p&gt;

&lt;h3 class="relative group"&gt;postgres_fdw Version History
 &lt;div id="postgres_fdw-version-history" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgres_fdw-version-history" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Version&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Release Support Notes&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;9.3&lt;/td&gt;
 &lt;td style="text-align: left"&gt;postgres_fdw released&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;9.6&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Support pushdown of join, sort, update, delete; fetch_size support&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;10&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Push down aggregate functions to remote server; more join pushdown scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Push down operators to partitioned tables; UPDATE/DELETE joins can push down&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;12&lt;/td&gt;
 &lt;td style="text-align: left"&gt;More order by/limit pushdown scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;13&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Enhanced password authentication; pg_dump can export foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;14&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Parallel scanning for queries with multiple foreign tables (async_capable); bulk insert; postgres_fdw_get_connections(); TRUNCATE foreign tables&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;15&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Push down CASE expressions; parallel commit (parallel_commit)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;16&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Interruptible parallel transactions; foreign table analyze_sampling; COPY batch_size; foreign table truncate triggers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;Sharding Implementation
 &lt;div id="sharding-implementation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sharding-implementation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;FDW-based Sharding
 &lt;div id="fdw-based-sharding" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fdw-based-sharding" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Many PostgreSQL forks (XC/XL, Citus, etc.) have implemented sharding, but PostgreSQL itself is a single-instance database without native sharding support. Since SQL/MED was defined for accessing external data, postgres_fdw can implement sharding by accessing external instances.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Core Sharding Features
 &lt;div id="core-sharding-features" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#core-sharding-features" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Key features needed for usable sharding:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Partition management — SQL/MED transparency allows sharding on partitioned tables.&lt;/li&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Partition optimization — partition pruning, PARTITION WISE JOIN, etc.&lt;/li&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Aggregate pushdown — push computation to shard nodes.&lt;/li&gt;
&lt;li&gt;&lt;input checked="" disabled="" type="checkbox"&gt; Parallel scanning — PG14 implemented.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; 2PC transactions — FDW doesn&amp;rsquo;t yet support this.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Shard management — foreign table partitions must be manually created and added.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Global transactions — global clocks, global snapshot management needed.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Distributed locks — stronger distributed lock mechanisms needed.&lt;/li&gt;
&lt;li&gt;&lt;input disabled="" type="checkbox"&gt; Batch writes — DML/COPY distribution to shards needs batch write support.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL&amp;rsquo;s FDW functionality derives from the SQL/MED standard for accessing external data, supporting many data source types.&lt;/li&gt;
&lt;li&gt;FDW has 4 basic objects: foreign data wrapper, server, user mapping, foreign table.&lt;/li&gt;
&lt;li&gt;postgres_fdw has many feature enhancements and performance optimizations, capable of pushing operators down to remote databases.&lt;/li&gt;
&lt;li&gt;Sharding can be implemented based on postgres_fdw, though some features still need improvement.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql04.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql04.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/13/postgres-fdw.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/postgres-fdw.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/file-fdw.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/file-fdw.html&lt;/a&gt;
&lt;a href="https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/WIP_PostgreSQL_Sharding&lt;/a&gt;
&lt;a href="https://www.percona.com/blog/postgres_fdw-enhancement-in-postgresql-14/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/postgres_fdw-enhancement-in-postgresql-14/&lt;/a&gt;
&lt;a href="https://www.percona.com/blog/foreign-data-wrappers-postgresql-postgres_fdw/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/foreign-data-wrappers-postgresql-postgres_fdw/&lt;/a&gt;
&lt;a href="https://www.percona.com/blog/parallel-commits-for-transactions-using-postgres_fdw-on-postgresql-15/" target="_blank" rel="noreferrer"&gt;https://www.percona.com/blog/parallel-commits-for-transactions-using-postgres_fdw-on-postgresql-15/&lt;/a&gt;
&lt;a href="https://www.enterprisedb.com/blog/postgresql-aggregate-push-down-postgresfdw" target="_blank" rel="noreferrer"&gt;https://www.enterprisedb.com/blog/postgresql-aggregate-push-down-postgresfdw&lt;/a&gt;
&lt;a href="https://www.postgresql.fastware.com/postgresql-insider-fdw-ove" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/postgresql-insider-fdw-ove&lt;/a&gt;
&lt;a href="https://momjian.us/main/writings/pgsql/sharding.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/sharding.pdf&lt;/a&gt;
&lt;a href="https://www.slideserve.com/johnna/sql-med-and-more-powerpoint-ppt-presentation" target="_blank" rel="noreferrer"&gt;https://www.slideserve.com/johnna/sql-med-and-more-powerpoint-ppt-presentation&lt;/a&gt;
&lt;a href="https://dbaplus.cn/news-19-2090-1.html" target="_blank" rel="noreferrer"&gt;https://dbaplus.cn/news-19-2090-1.html&lt;/a&gt;
&lt;a href="https://www.highgo.ca/2019/08/08/horizontal-scalability-with-sharding-in-postgresql-where-it-is-going-part-3-of-3/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2019/08/08/horizontal-scalability-with-sharding-in-postgresql-where-it-is-going-part-3-of-3/&lt;/a&gt;
&lt;a href="https://www.highgo.ca/2021/06/28/parallel-execution-of-postgres_fdw-scans-in-pg-14-important-step-forward-for-horizontal-scaling/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2021/06/28/parallel-execution-of-postgres_fdw-scans-in-pg-14-important-step-forward-for-horizontal-scaling/&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Brief Analysis of PostgreSQL Memory</title><link>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-memory/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-memory/</guid><description>&lt;h2 class="relative group"&gt;Architecture
 &lt;div id="architecture" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#architecture" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8ca0ab97a875.png" alt="Shared Memory in PostgreSQL" /&gt;
(&lt;a href="https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ec5a1dae77e.png" alt="PostgreSQL Process Structure and Memory Structure - Figure 2" /&gt;
(&lt;a href="http://geekdaxue.co/read/fcant@sql/qts5is" target="_blank" rel="noreferrer"&gt;http://geekdaxue.co/read/fcant@sql/qts5is&lt;/a&gt;)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shared Memory
 &lt;div id="shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Linux Shared Memory Implementation
 &lt;div id="linux-shared-memory-implementation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#linux-shared-memory-implementation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/026fc1403eb5.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared Memory on Linux&lt;/strong&gt;
Shared memory is an IPC (Inter-Process Communication) mechanism supported by Unix-based operating systems (including Linux). It is a type of memory that multiple processes can simultaneously use to communicate with each other. Shared memory is one of the fastest IPC mechanisms because it does not require processes to copy data between each other. Processes can access shared memory through their own address space.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Architecture
 &lt;div id="architecture" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#architecture" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8ca0ab97a875.png" alt="Shared Memory in PostgreSQL" /&gt;
(&lt;a href="https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/blog/lets-get-back-to-basics-postgresql-memory-components&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ec5a1dae77e.png" alt="PostgreSQL Process Structure and Memory Structure - Figure 2" /&gt;
(&lt;a href="http://geekdaxue.co/read/fcant@sql/qts5is" target="_blank" rel="noreferrer"&gt;http://geekdaxue.co/read/fcant@sql/qts5is&lt;/a&gt;)&lt;/p&gt;

&lt;h2 class="relative group"&gt;Shared Memory
 &lt;div id="shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Linux Shared Memory Implementation
 &lt;div id="linux-shared-memory-implementation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#linux-shared-memory-implementation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/026fc1403eb5.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared Memory on Linux&lt;/strong&gt;
Shared memory is an IPC (Inter-Process Communication) mechanism supported by Unix-based operating systems (including Linux). It is a type of memory that multiple processes can simultaneously use to communicate with each other. Shared memory is one of the fastest IPC mechanisms because it does not require processes to copy data between each other. Processes can access shared memory through their own address space.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Two Forms of Shared Memory&lt;/strong&gt;
One form of shared memory is memory-mapped files. Once multiple processes map the same file into their address space, they can access the file&amp;rsquo;s contents and simultaneously update the file directly using the mapped memory. Another form of shared memory is anonymous memory. This refers to shared memory regions allocated by programs without associating them with a file or persistent storage mechanism.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;mmap()&lt;/strong&gt;
Mapping a file into a process&amp;rsquo;s address space uses &lt;code&gt;mmap()&lt;/code&gt;. Anonymous memory can also be created with &lt;code&gt;mmap()&lt;/code&gt;. &lt;a href="https://www.man7.org/linux/man-pages/man2/mmap.2.html" target="_blank" rel="noreferrer"&gt;mmap&lt;/a&gt; is part of the standard C library. For anonymous memory, the flags should be &lt;code&gt;MAP_ANONYMOUS&lt;/code&gt; or &lt;code&gt;MAP_ANON&lt;/code&gt;, in which case &lt;code&gt;fd&lt;/code&gt; should be NULL or -1, and &lt;code&gt;offset&lt;/code&gt; should be 0.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fcd702da523d.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.tutorialsdaddy.com/courses/linux-device-driver/lessons/mmap/" target="_blank" rel="noreferrer"&gt;http://www.tutorialsdaddy.com/courses/linux-device-driver/lessons/mmap/&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shared Memory in PostgreSQL
 &lt;div id="shared-memory-in-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shared-memory-in-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0a37e863fe80.png" alt="Image" /&gt;
&lt;a href="https://www.interdb.jp/pg/pgsql02.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has many types of shared memory: shared buffers, WAL buffer, CLOG buffer, lock space, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Shared Buffer&lt;/strong&gt;
The shared memory area where PostgreSQL caches data, similar to Oracle&amp;rsquo;s SGA. When data hits the shared buffer, it is read directly from memory without requiring disk I/O.
PostgreSQL loads table pages and indexes from persistent storage into this area and operates on them directly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;WAL Buffer&lt;/strong&gt;
To ensure no data is lost in the event of a server failure, PostgreSQL supports the WAL mechanism. WAL data (also called XLOG records) is PostgreSQL&amp;rsquo;s transaction log. The WAL BUFFER is the buffer for WAL data before it is written to persistent storage.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CLOG BUFFER&lt;/strong&gt;
The Commit Log (CLOG) maintains the status of all transactions (e.g., in_progress, committed, aborted) for the concurrency control mechanism. The corresponding CLOG BUFFER is the buffer for CLOG data before it is written to disk.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PostgreSQL Shared Memory Parameters
 &lt;div id="postgresql-shared-memory-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-shared-memory-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;shared_buffers&lt;/code&gt;&lt;/strong&gt;
Default 128MB. Recommended to configure at 25% of total memory. Because PostgreSQL&amp;rsquo;s private memory generally takes up a significant portion and relies on cache, sufficient memory must be left for the OS. It is therefore not recommended to set this to as high a value (relative to total memory) as you would for Oracle&amp;rsquo;s SGA.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;shared_memory_type&lt;/code&gt;&lt;/strong&gt;
Specifies the shared memory implementation method, not only for shared_buffers but also for other shared data areas.
The shared memory implementation varies by platform. (It appears) on Linux the default is &lt;code&gt;mmap&lt;/code&gt;. Other values are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;posix&lt;/code&gt; (for POSIX shared memory allocated using &lt;code&gt;shm_open&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sysv&lt;/code&gt; (for System V shared memory allocated via &lt;code&gt;shmget&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;windows&lt;/code&gt; (for Windows shared memory)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mmap&lt;/code&gt; (to simulate shared memory using memory-mapped files stored in the data directory)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;By default, PostgreSQL uses a very small amount of System V shared memory, with the vast majority being mmap shared memory. Due to &lt;a href="https://postgreshelp.com/postgresql-dynamic-shared-memory-posix-vs-mmap/" target="_blank" rel="noreferrer"&gt;differences between POSIX and System V IPC&lt;/a&gt;, signal implementations differ. The &lt;code&gt;shared_memory_type&lt;/code&gt; parameter can be explicitly adjusted for the IPC implementation mechanism:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/kernel-resources.html#SYSVIPC" target="_blank" rel="noreferrer"&gt;Setting System V IPC&lt;/a&gt; (default is &lt;code&gt;mmap&lt;/code&gt;):
On Linux and FreeBSD systems, the default shared memory system settings are generally sufficient. Setting &lt;code&gt;shared_memory_type&lt;/code&gt; to &lt;code&gt;sysv&lt;/code&gt; does not take effect on these two platforms (System V semaphores are not used on this platform).
On OpenBSD systems, if &lt;code&gt;shared_memory_type&lt;/code&gt; is set to &lt;code&gt;sysv&lt;/code&gt;, the default shared memory system parameters are insufficient and need to be adjusted via sysctl.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Setting POSIX IPC:
POSIX semaphores are effective on Linux and FreeBSD.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;dynamic_shared_memory_type&lt;/code&gt;&lt;/strong&gt;
The mechanism for dynamic shared memory, defaults to &lt;code&gt;posix&lt;/code&gt;. This parameter is important for parallel queries. A &lt;a href="https://www.postgresql.org/message-id/CA%2BhUKGJOj7qzDLxeFPVvto8YEWop6FSQoTYPO9Z6Ee%3Di-nPS_Q%40mail.gmail.com" target="_blank" rel="noreferrer"&gt;community email about /dev/shm&lt;/a&gt; describes:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;PostgreSQL creates segments in /dev/shm for parallel queries (via&lt;br&gt;
shm_open()), not for shared buffers. The amount used is controlled by&lt;br&gt;
work_mem. Queries can use up to work_mem for each node you see in the&lt;br&gt;
EXPLAIN plan, and for each process, so it can be quite a lot if you&lt;br&gt;
have lots of parallel worker processes and/or lots of&lt;br&gt;
tables/partitions being sorted or hashed in your query.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Translation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Parallel queries use POSIX and create segments in &lt;code&gt;/dev/shm&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Parallel queries do NOT use &lt;code&gt;shared_buffers&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Each plan node in a query is limited by &lt;code&gt;work_mem&lt;/code&gt;!&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;min_dynamic_shared_memory&lt;/code&gt;&lt;/strong&gt;
The initial size of memory used by parallel queries, allocated at server startup. Related to &lt;code&gt;huge_pages&lt;/code&gt; and &lt;code&gt;dynamic_shared_memory_type&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;huge_pages&lt;/code&gt;&lt;/strong&gt;
This parameter controls whether the &lt;strong&gt;main shared memory area&lt;/strong&gt; uses huge pages. This means private memory and OS-level memory are not affected by this setting. PostgreSQL&amp;rsquo;s use of huge pages is currently only supported on Linux and Windows systems; on Linux systems, it is only supported when &lt;code&gt;shared_memory_type&lt;/code&gt; is set to &lt;code&gt;mmap&lt;/code&gt;!&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Setting&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;try&lt;/td&gt;
 &lt;td&gt;default, attempts to allocate huge pages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;on&lt;/td&gt;
 &lt;td&gt;uses huge pages; server will not start if allocation fails&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;off&lt;/td&gt;
 &lt;td&gt;does not use huge pages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;huge_page_size&lt;/code&gt;&lt;/strong&gt;
Controls the size of huge pages. Default is 0, meaning PostgreSQL uses the huge page size provided by the operating system. Setting a non-default value is only supported on Linux.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The pg_shmem_allocations View
 &lt;div id="the-pg_shmem_allocations-view" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_shmem_allocations-view" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;pg_shmem_allocations&lt;/code&gt; is a view introduced in PG13 that allows viewing the allocation of major shared memory segments, including those from PostgreSQL itself and extensions.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sum&lt;/span&gt;(allocated_size)&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; gb &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_shmem_allocations;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; gb 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;7658920288085938&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_shmem_allocations &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;desc&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; allocated_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------+------------+------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffer Blocks &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38575360&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2415919104&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2415919104&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2729553280&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240300672&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240300672&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;anonymous&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240198528&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;240198528&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffer Descriptors &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19700992&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18874368&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18874368&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; XLOG Ctl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;171008&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16803472&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16803584&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Backend Activity Buffer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2707733248&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10680320&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10680320&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;NULL indicates unused memory, &lt;code&gt;anonymous&lt;/code&gt; indicates anonymous page allocations.
Most of the memory modules in the &lt;code&gt;pg_shmem_allocations&lt;/code&gt; view are difficult to understand. You can find them by searching the source code, but there is no intuitive explanation — it simply displays the data from the source code&amp;rsquo;s init memory module.&lt;/p&gt;
&lt;p&gt;Example: Buffer Blocks:
Searching the source code directly for &amp;ldquo;buffer blocks&amp;rdquo;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Initialize shared buffer pool
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Called only once, during shared memory initialization
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;InitBufferPool&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		foundBufs,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				foundDescs,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				foundIOCV,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				foundBufCkpt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Align descriptors to a cacheline boundary. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BufferDescriptors &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (BufferDescPadded &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Buffer Descriptors&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(BufferDescPadded),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundDescs);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BufferBlocks &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Buffer Blocks&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; (Size) BLCKSZ, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundBufs);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Align condition variables to cacheline boundary. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BufferIOCVArray &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (ConditionVariableMinimallyPadded &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Buffer IO Condition Variables&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(ConditionVariableMinimallyPadded),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundIOCV);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Checkpoint BufferIds are used to sort checkpoints in shared memory
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CkptBufferIds &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (CkptSortItem &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ShmemInitStruct&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Checkpoint BufferIds&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						NBuffers &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(CkptSortItem), &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;foundBufCkpt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;InitBufferPool()&lt;/code&gt; function initializes the shared buffer.&lt;/li&gt;
&lt;li&gt;The shared buffer has 4 sub-pools: Buffer Descriptors, Buffer Blocks, Buffer IO Condition Variables, Checkpoint BufferIds.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Private Memory
 &lt;div id="private-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#private-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Private memory is memory areas allocated by PostgreSQL for each session or process. Unlike shared buffers, there is not just one. Private memory of each process cannot be accessed by other processes.



&lt;img src="https://lastdba.com/img/csdn/b9b739d63ed8.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;temp_buffers&lt;/code&gt;&lt;/strong&gt;
Temp buffers are used to cache temporary table data, default 8MB. temp_buffers is private memory, so temporary tables are only visible to the current session.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;work_mem&lt;/code&gt;&lt;/strong&gt;
The maximum memory used by query operations, such as sorts and hash tables. Default 4MB.
&lt;em&gt;Each query or each plan node?&lt;/em&gt;
&lt;a href="https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM" target="_blank" rel="noreferrer"&gt;Official documentation&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Note that a complex query might perform several sort and hash operations at the same time, with each operation generally being allowed to use as much memory as this value specifies before it starts to write data into temporary files.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/CA%2BhUKGJOj7qzDLxeFPVvto8YEWop6FSQoTYPO9Z6Ee%3Di-nPS_Q%40mail.gmail.com" target="_blank" rel="noreferrer"&gt;Community email about /dev/shm&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Queries can use up to work_mem for each node you see in the&lt;br&gt;
EXPLAIN plan,&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;em&gt;This parameter applies to each operation (plan node) in a query, not to each query.&lt;/em&gt; A query can have many parallel operations, so a single query can also consume a lot of memory. Therefore, the &lt;code&gt;work_mem&lt;/code&gt; setting must be made very carefully to avoid exhausting OS memory. The worst case: multiple sessions, each session having multiple plan nodes, and those plan nodes using operations that heavily consume work_mem.
&lt;em&gt;Which operations use work_mem?&lt;/em&gt;
For sort operations: ORDER BY, DISTINCT, merge joins. For hash table usage: hash joins, hash-based aggregation, memoize nodes, hash-based IN subqueries.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;hash_mem_multiplier&lt;/code&gt;&lt;/strong&gt;
Used to limit the memory size of hash-based operations. The limit is &lt;code&gt;hash_mem_multiplier&lt;/code&gt; * &lt;code&gt;work_mem&lt;/code&gt;. &lt;code&gt;hash_mem_multiplier&lt;/code&gt; defaults to 2.
Although work_mem can be limited, you cannot limit how many hash operations a query uses, so PG13 added this parameter. This means that before version 12 (inclusive), it was very difficult to limit hash table memory.
&lt;em&gt;In our 9.6 production environment, we found a single session consuming 300GB of memory. The culprit was the lack of hash table limits in older versions combined with an execution plan that incorrectly used hash tables.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;maintenance_work_mem&lt;/code&gt;&lt;/strong&gt;
Memory area used by operations such as &lt;code&gt;VACUUM&lt;/code&gt;, &lt;code&gt;CREATE INDEX&lt;/code&gt;, and &lt;code&gt;ALTER TABLE ADD FOREIGN KEY&lt;/code&gt;. These are session-initiated operations with independent processes that use private memory. These maintenance operations cannot run in parallel within a single session, and concurrency is generally low, so this parameter can be set relatively high.
Autovacuum may also use this memory area and limit. See &lt;code&gt;autovacuum_work_mem&lt;/code&gt; explanation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;autovacuum_work_mem&lt;/code&gt;&lt;/strong&gt;
Maximum memory used by each autovacuum worker process. Default -1, meaning the &lt;code&gt;maintenance_work_mem&lt;/code&gt; parameter is used to limit autovacuum workers. Vacuum uses at most 1GB of memory, and autovacuum has the same limit, so setting the vacuum/autovacuum memory limit above 1GB is meaningless.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;vacuum_buffer_usage_limit&lt;/code&gt;&lt;/strong&gt;
Limits the number of pages that &lt;code&gt;VACUUM&lt;/code&gt; and &lt;code&gt;ANALYZE&lt;/code&gt; can access from shared memory, to prevent too many pages from being evicted. Default is 256KB, 0 means no limit.
When using &lt;code&gt;VACUUM&lt;/code&gt; or &lt;code&gt;ANALYZE&lt;/code&gt; commands, &lt;code&gt;BUFFER_USAGE_LIMIT&lt;/code&gt; can be specified, which takes precedence over the GUC parameter &lt;code&gt;vacuum_buffer_usage_limit&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;max_stack_depth&lt;/code&gt;&lt;/strong&gt;
The maximum safe depth of the execution stack, generally meaning the stack depth of a recursive function executed on a single backend process. Default is 2MB. The OS kernel stack limit should be set slightly larger than &lt;code&gt;max_stack_depth&lt;/code&gt;.
If a recursive function exceeds the stack depth, the following error is reported:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: stack depth limit exceeded HINT: 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Increase the configuration parameter max_stack_depth &lt;span style="color:#f92672"&gt;(&lt;/span&gt;currently 2048kB&lt;span style="color:#f92672"&gt;)&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;after ensuring the platform&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;s stack depth limit is adequate.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;logical_decoding_work_mem&lt;/code&gt;&lt;/strong&gt;
Before PG13, logical decoding would retain at most 4096 changes in memory (&lt;code&gt;max_changes_in_memory&lt;/code&gt; hardcoded in the source). PG13 introduced the parameter &lt;code&gt;logical_decoding_work_mem&lt;/code&gt;. If the data held by logical decoding exceeds this memory value, it is written to disk. Default 64MB.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;each replication connection only uses a single buffer of this size,&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Generally, the number of logical replication connections is not large, so &lt;code&gt;logical_decoding_work_mem&lt;/code&gt; can be set relatively high without issues.&lt;/p&gt;

&lt;h2 class="relative group"&gt;xxCache
 &lt;div id="xxcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xxcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;xxCache is also private memory.&lt;/strong&gt; For example, PostgreSQL caches relation metadata in relcache. The official documentation has relatively little description about this, but PostgreSQL memory problems are often related to it.
For instance, the issue of catalog cache causing each backend process to consume a lot of memory without releasing it has appeared in many environments. Here is a &lt;a href="https://www.postgresql.org/message-id/flat/20160708012833.1419.89062%40wrigleys.postgresql.org#20160708012833.1419.89062@wrigleys.postgresql.org" target="_blank" rel="noreferrer"&gt;community email from 2016 by Digoal about catalog cache consuming excessive memory&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Every PostgreSQL session holds system data in own cache. Usually this cache is pretty small (for significant numbers of users). But can be pretty big if your catalog is untypically big and you touch almost all objects from&lt;br&gt;
catalog in session. A implementation of this cache is simple - there is not&lt;br&gt;
delete or limits. There is not garabage collector (and issue related to&lt;br&gt;
GC), what is great, but the long sessions on big catalog can be problem.&lt;br&gt;
The solution is simple - close session over some time or over some number of operations. Then all memory in caches will be released.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The community&amp;rsquo;s explanation of catalog cache:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each session has its own cache for storing system data (metadata, etc.)&lt;/li&gt;
&lt;li&gt;Generally, this cache is small. When the catalog is large and a session has accessed all catalog objects, the cache can become very large.&lt;/li&gt;
&lt;li&gt;Cache management is simple: &lt;strong&gt;there is no deletion mechanism or limit&lt;/strong&gt; (though invalidation messages do exist).&lt;/li&gt;
&lt;li&gt;Closing the session releases the cache.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tom Lane&amp;rsquo;s solution was also simple and blunt — add more hardware resources:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;I do not think you should complain if that takes a great deal of memory. Either rethink why you need so many tables, or buy hardware commensurate with the size of your problem.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In fact, there are many knowledge points about caches worth paying attention to. After understanding their principles, the solutions to cache-caused memory issues may not be limited to just one approach.
There are many types of xxCache, such as relcache, syscache, plancache, etc. Since documentation is scarce, understanding xxCache requires reading the source code. The main xxCache source code is under &lt;code&gt;src/backend/utils/cache&lt;/code&gt;.
&lt;em&gt;Source structure&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inval.c				&lt;span style="color:#f92672"&gt;--&lt;/span&gt; Invalidation message dispatcher &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; private caches. The corresponding shared cache invalidation message handler is sinval.c
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relfilenodemap.c	&lt;span style="color:#f92672"&gt;--&lt;/span&gt; relfilenode to oid mapping cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ts_cache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; Cache &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;Tsearch&lt;/span&gt; (text search) related objects
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relmapper.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; catalog to relfilenode mapping cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;typcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; type cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;spccache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; tablespace cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;evtcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; event trigger cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;attoptcache.c		&lt;span style="color:#f92672"&gt;--&lt;/span&gt; attribute cache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;plancache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; plan cache 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; relation cache 							 								&lt;span style="color:#f92672"&gt;*&lt;/span&gt;Focus of this article&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;catcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; system catalog cache 					 						&lt;span style="color:#f92672"&gt;*&lt;/span&gt;Focus of this article&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;syscache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; one layer above catcache, also system catalog cache	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;Focus of this article&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lsyscache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; routines &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; conveniently querying catalog cache, &lt;span style="color:#e6db74"&gt;&amp;#39;l&amp;#39;&lt;/span&gt; likely stands &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; lookup
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;partcache.c			&lt;span style="color:#f92672"&gt;--&lt;/span&gt; routines &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; operating on partition information in relcache&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In addition to handling various caches, there is also source code for operations and messages. Below we focus on relcache, catcache/syscache, and invalidation messages.&lt;/p&gt;

&lt;h3 class="relative group"&gt;relcache
 &lt;div id="relcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#relcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What data does a relcache entry store?&lt;/strong&gt;
Defined in &lt;code&gt;src/include/utils/rel.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; POSTGRES relation &lt;span style="color:#a6e22e"&gt;descriptor&lt;/span&gt; (a&lt;span style="color:#f92672"&gt;/&lt;/span&gt;k&lt;span style="color:#f92672"&gt;/&lt;/span&gt;a relcache entry) definitions.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationData&lt;/code&gt; is the primary data structure for relcache entries:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; RelationData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RelFileNode rd_node;		&lt;span style="color:#75715e"&gt;/* physical identifier of relation */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SMgrRelation rd_smgr;		&lt;span style="color:#75715e"&gt;/* cached file handle, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			rd_refcnt;		&lt;span style="color:#75715e"&gt;/* reference count */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BackendId	rd_backend;		&lt;span style="color:#75715e"&gt;/* if temp relation, the owning backend id */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_islocaltemp; &lt;span style="color:#75715e"&gt;/* is it a temp rel of the current session */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_isnailed;	&lt;span style="color:#75715e"&gt;/* is it nailed in cache */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_isvalid;		&lt;span style="color:#75715e"&gt;/* is the relcache entry valid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_indexvalid;	&lt;span style="color:#75715e"&gt;/* are the indexes on the relation valid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_statvalid;	&lt;span style="color:#75715e"&gt;/* are the statistics on the relation valid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* some subtransaction info */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_createSubid;	&lt;span style="color:#75715e"&gt;/* rel was created in current xact */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_newRelfilenodeSubid;	&lt;span style="color:#75715e"&gt;/* highest subxact changing rd_node to current value */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_firstRelfilenodeSubid;	&lt;span style="color:#75715e"&gt;/* highest subxact changing rd_node to any value */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SubTransactionId rd_droppedSubid;	&lt;span style="color:#75715e"&gt;/* dropped with another Subid set */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Form_pg_class rd_rel;		&lt;span style="color:#75715e"&gt;/* pointer to the relation&amp;#39;s pg_class tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TupleDesc	rd_att;			&lt;span style="color:#75715e"&gt;/* tuple descriptor */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			rd_id;			&lt;span style="color:#75715e"&gt;/* relation&amp;#39;s oid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	LockInfoData rd_lockInfo;	&lt;span style="color:#75715e"&gt;/* lock info on the relation */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RuleLock &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_rules;		&lt;span style="color:#75715e"&gt;/* rewrite rules */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext rd_rulescxt;	&lt;span style="color:#75715e"&gt;/* private memory cxt for rd_rules */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TriggerDesc &lt;span style="color:#f92672"&gt;*&lt;/span&gt;trigdesc;		&lt;span style="color:#75715e"&gt;/* trigger info, NULL if none */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* foreign key info */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_fkeylist;	&lt;span style="color:#75715e"&gt;/* list of ForeignKeyCacheInfo (see below) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		rd_fkeyvalid;	&lt;span style="color:#75715e"&gt;/* true if list has been computed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* partition info */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PartitionKey rd_partkey;	&lt;span style="color:#75715e"&gt;/* partition key, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext rd_partkeycxt;	&lt;span style="color:#75715e"&gt;/* private context for rd_partkey, if any */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_indexlist;	&lt;span style="color:#75715e"&gt;/* list of all index OIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			rd_pkindex;		&lt;span style="color:#75715e"&gt;/* primary key oid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			rd_replidindex; &lt;span style="color:#75715e"&gt;/* replica identity index oid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_statlist;	&lt;span style="color:#75715e"&gt;/* list of extended stats OIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PublicationDesc &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_pubdesc;	&lt;span style="color:#75715e"&gt;/* publication descriptor, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	bytea	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_options;		&lt;span style="color:#75715e"&gt;/* parsed pg_class.reloptions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Form_pg_index rd_index;		&lt;span style="color:#75715e"&gt;/* index descriptor in pg_index tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; HeapTupleData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_indextuple;	&lt;span style="color:#75715e"&gt;/* all pg_index tuples */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext rd_indexcxt;	&lt;span style="color:#75715e"&gt;/* index cxt */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_amcache;		&lt;span style="color:#75715e"&gt;/* available for use by index/table AM */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; FdwRoutine &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rd_fdwroutine;	&lt;span style="color:#75715e"&gt;/* cached function pointers, or NULL */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} RelationData;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationData&lt;/code&gt; contains a large amount of relation-related metadata: oid, pg_class, partition tables, subtransactions, row security policies, statistics, index metadata, AM, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;relcache ROUTINES&lt;/strong&gt;
The ROUTINES source code is located at &lt;code&gt;src/backend/utils/cache/relcache.c&lt;/code&gt;.
There are mainly 5 stages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;RelationCacheInitialize&lt;/code&gt; - Initialize relcache, initially empty&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationCacheInitializePhase2&lt;/code&gt; - Initialize shared catalogs&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationCacheInitializePhase3&lt;/code&gt; - Complete relcache initialization&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationIdGetRelation&lt;/code&gt; - Get relation descriptor by relation id&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RelationClose&lt;/code&gt; - Close a relation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These 5 stages are the 5 main logical steps for a rel entry, equivalent to the lifecycle of a rel entry, not the lifecycle of relcache. The first three stages are all relcache initialization — they initialize relcache and load some system tables and their indexes. The last two stages are the logic for obtaining a reldesc and closing a relation; the relcache itself still exists.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 1&lt;/em&gt;: &lt;code&gt;RelationCacheInitialize&lt;/code&gt;
&lt;code&gt;RelationCacheInitialize&lt;/code&gt; initializes relcache:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Define initial size 400
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INITRELCACHESIZE		400
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationCacheInitialize&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HASHCTL		ctl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			allocsize;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * make sure cache memory context exists
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Check if cache mctx exists, create one if not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;CacheMemoryContext)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CreateCacheMemoryContext&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Create hash table indexed by OID for relcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ctl.keysize &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(Oid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ctl.entrysize &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(RelIdCacheEnt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RelationIdCache &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;hash_create&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Relcache by OID&amp;#34;&lt;/span&gt;, INITRELCACHESIZE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ctl, HASH_ELEM &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH_BLOBS);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Initialize relation mapper
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationMapInitialize&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationCacheInitialize&lt;/code&gt; does not allocate any relation operations; it only initializes relcache memory, hash tables, mappers, etc.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 2&lt;/em&gt;: &lt;code&gt;RelationCacheInitializePhase2&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationCacheInitializePhase2&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext oldcxt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Initialize relation mapper
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationMapInitializePhase2&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If in bootstrap mode, shared catalogs don&amp;#39;t exist yet, so do nothing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Switch to current cache mctx
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	oldcxt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(CacheMemoryContext);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Try to load shared relcache file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;load_relcache_init_file&lt;/span&gt;(true)) &lt;span style="color:#75715e"&gt;// If init file not loaded
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_database&amp;#34;&lt;/span&gt;, DatabaseRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_database, Desc_pg_database);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_authid&amp;#34;&lt;/span&gt;, AuthIdRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_authid, Desc_pg_authid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_auth_members&amp;#34;&lt;/span&gt;, AuthMemRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_auth_members, Desc_pg_auth_members);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_shseclabel&amp;#34;&lt;/span&gt;, SharedSecLabelRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_shseclabel, Desc_pg_shseclabel);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_subscription&amp;#34;&lt;/span&gt;, SubscriptionRelation_Rowtype_Id, true,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_subscription, Desc_pg_subscription);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_SHARED_RELS	5	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(oldcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The init file is divided into shared and local cache init files. &lt;code&gt;load_relcache_init_file()&lt;/code&gt; attempts to load data from these two types of files into relcache (here it should only load the shared ones). If loading fails, it creates descriptors for the 5 basic system tables: &lt;code&gt;pg_database&lt;/code&gt;, &lt;code&gt;pg_authid&lt;/code&gt;, etc.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 3&lt;/em&gt;:
&lt;code&gt;RelationCacheInitializePhase3&lt;/code&gt; is the third stage of initialization and contains the most content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationCacheInitializePhase3&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HASH_SEQ_STATUS status;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RelIdCacheEnt &lt;span style="color:#f92672"&gt;*&lt;/span&gt;idhentry;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext oldcxt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		needNewCacheFile &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;criticalSharedRelcachesBuilt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationMapInitializePhase3&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Switch to CacheMemoryContext
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	oldcxt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(CacheMemoryContext);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Like stage 2, load more system table descriptors
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;() &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;load_relcache_init_file&lt;/span&gt;(false))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		needNewCacheFile &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_class&amp;#34;&lt;/span&gt;, RelationRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_class, Desc_pg_class);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_attribute&amp;#34;&lt;/span&gt;, AttributeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_attribute, Desc_pg_attribute);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_proc&amp;#34;&lt;/span&gt;, ProcedureRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_proc, Desc_pg_proc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_type&amp;#34;&lt;/span&gt;, TypeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_type, Desc_pg_type);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_LOCAL_RELS 4	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(oldcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If we haven&amp;#39;t obtained critical system indexes yet, do it now
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Because catcache and/or opclass cache depend on critical system indexes in relcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;criticalRelcachesBuilt) &lt;span style="color:#75715e"&gt;// If critical indexes not loaded
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(ClassOidIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							RelationRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(TriggerRelidNameIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							TriggerRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_LOCAL_INDEXES	7	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		criticalRelcachesBuilt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// Mark: critical system table indexes obtained
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Continue processing shared critical system table indexes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// These shared critical system tables are needed in certain situations (autovacuum, client authentication, etc.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;criticalSharedRelcachesBuilt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(DatabaseNameIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							DatabaseRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;load_critical_index&lt;/span&gt;(SharedSecLabelObjectIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							SharedSecLabelRelationId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_SHARED_INDEXES 6	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		criticalSharedRelcachesBuilt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// Mark: shared critical system table indexes obtained
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Scan all entries in relcache and update those that are erroneous
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// from formrdesc or init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If erroneous, read pg_class data and replace the erroneous entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Because the cache file does not contain rules, triggers, security policies,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// also fetch from pg_class
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; ((idhentry &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (RelIdCacheEnt &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#a6e22e"&gt;hash_seq_search&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;status)) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Relation	relation &lt;span style="color:#f92672"&gt;=&lt;/span&gt; idhentry&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;reldesc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Ensure relations in use are not flushed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationIncrementReferenceCount&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If it&amp;#39;s an erroneous entry, read the tuple from pg_class
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relowner &lt;span style="color:#f92672"&gt;==&lt;/span&gt; InvalidOid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;((&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel, (&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) relp, CLASS_TUPLE_SIZE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Update rd_option
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_options)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;pfree&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_options);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationParseRelOptions&lt;/span&gt;(relation, htup);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReleaseSysCache&lt;/span&gt;(htup);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Fix data not in the init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// For example, relhasrules, relhastriggers may be outdated or incorrect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhasrules &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rules &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationBuildRuleLock&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rules &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhasrules &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhastriggers &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;trigdesc &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationBuildTriggers&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;trigdesc &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relhastriggers &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Reload row security policies, since init file doesn&amp;#39;t contain them
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relrowsecurity &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rsdesc &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationBuildRowSecurity&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rsdesc &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If tableam needs reloading
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_tableam &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			(&lt;span style="color:#a6e22e"&gt;RELKIND_HAS_TABLE_AM&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_SEQUENCE))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationInitTableAccessMethod&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_tableam &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			restart &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Decrement reference count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationDecrementReferenceCount&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Finally, if needed, update the init file (since there may have been reloads, don&amp;#39;t waste them)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (needNewCacheFile)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;InitCatalogCachePhase2&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* now write the files */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(true); &lt;span style="color:#75715e"&gt;// Write global init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(false); &lt;span style="color:#75715e"&gt;// Write private init file
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compared to Stage 2 which loads 5 system tables, &lt;code&gt;RelationCacheInitializePhase3()&lt;/code&gt; loads more system tables, such as &lt;code&gt;pg_class&lt;/code&gt;, &lt;code&gt;pg_proc&lt;/code&gt;, and the indexes on these tables. Of course, the precondition for loading these rels is that they are not in cache or have expired. After reloading is complete, the &amp;ldquo;new&amp;rdquo; catalog is written to the init file.
Looking at the &lt;code&gt;write_relcache_init_file&lt;/code&gt; function source code when writing the init file, we can understand the meaning of the &lt;code&gt;true&lt;/code&gt; and &lt;code&gt;false&lt;/code&gt; parameters:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; shared)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (shared)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(tempfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(tempfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;global/%s.%d&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 RELCACHE_INIT_FILENAME, MyProcPid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(finalfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(finalfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;global/%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 RELCACHE_INIT_FILENAME);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(tempfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(tempfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;%s/%s.%d&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 DatabasePath, RELCACHE_INIT_FILENAME, MyProcPid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;snprintf&lt;/span&gt;(finalfilename, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(finalfilename), &lt;span style="color:#e6db74"&gt;&amp;#34;%s/%s&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 DatabasePath, RELCACHE_INIT_FILENAME);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;true&lt;/code&gt; means write to the global init file.
&lt;code&gt;false&lt;/code&gt; means write to the local init file.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;RELCACHE_INIT_FILENAME&lt;/code&gt; parameter macro definition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define RELCACHE_INIT_FILENAME &amp;#34;pg_internal.init&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So the written init files are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;shared: &lt;code&gt;global/pg_internal.init&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;local: &lt;code&gt;DatabasePath/pg_internal.init&lt;/code&gt; and &lt;code&gt;DatabasePath/pg_internal.init.myPID&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s look at real init file paths:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ find ./ -name *init*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./global/pg_internal.init &lt;span style="color:#75715e"&gt;#shared&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/1/pg_internal.init &lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/13577/pg_internal.init &lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/13578/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/16398/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/16811/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;./base/17674/pg_internal.init	&lt;span style="color:#75715e"&gt;#local&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Diagram of the three initialization stages call flow:



&lt;img src="https://lastdba.com/img/csdn/f743c6c69083.png" alt="Image" /&gt;
(&lt;a href="https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/" target="_blank" rel="noreferrer"&gt;https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 4&lt;/em&gt;: &lt;code&gt;RelationIdGetRelation&lt;/code&gt;
Find a reldesc by OID. The caller only needs an AccessShareLock on the OID and is responsible for incrementing/decrementing the rel&amp;rsquo;s reference count.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Relation
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationIdGetRelation&lt;/span&gt;(Oid relationId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Relation	rd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Ensure we&amp;#39;re in a transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;IsTransactionState&lt;/span&gt;());
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// First try to find in cache via reldesc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationIdCacheLookup&lt;/span&gt;(relationId, rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationIsValid&lt;/span&gt;(rd))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Return NULL for dropped relations
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_droppedSubid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; InvalidSubTransactionId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_isvalid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationIncrementReferenceCount&lt;/span&gt;(rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_isvalid) &lt;span style="color:#75715e"&gt;// If cached rel is invalid, revalidate it
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_INDEX &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				rd&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_PARTITIONED_INDEX) &lt;span style="color:#75715e"&gt;// Load index info directly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;RelationReloadIndexInfo&lt;/span&gt;(rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;// For non-index, clear the reldesc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;RelationClearRelation&lt;/span&gt;(rd, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; rd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// No reldesc found, create a new one
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	rd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;RelationBuildDesc&lt;/span&gt;(relationId, true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationIsValid&lt;/span&gt;(rd))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationIncrementReferenceCount&lt;/span&gt;(rd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; rd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationIdGetRelation&lt;/code&gt; is relatively simple: it obtains a reldesc and index info via OID.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Stage 5&lt;/em&gt;: &lt;code&gt;RelationClose&lt;/code&gt;
The code for &lt;code&gt;RelationClose&lt;/code&gt; is also quite simple:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationClose&lt;/span&gt;(Relation relation)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// No lock operations needed, simply decrement refcount
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;RelationDecrementReferenceCount&lt;/span&gt;(relation);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If no sessions have the relation open, partition descriptors can be deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationHasReferenceCountZero&lt;/span&gt;(relation))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pdcxt &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pdcxt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;firstchild &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;MemoryContextDeleteChildren&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pdcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pddcxt &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pddcxt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;firstchild &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;MemoryContextDeleteChildren&lt;/span&gt;(relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_pddcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#ifdef RELCACHE_FORCE_RELEASE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;RelationHasReferenceCountZero&lt;/span&gt;(relation) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_createSubid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; InvalidSubTransactionId &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		relation&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_firstRelfilenodeSubid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; InvalidSubTransactionId)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationClearRelation&lt;/span&gt;(relation, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#endif
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;RelationClose&lt;/code&gt; is the operation for closing access to a relation. Generally, this function only decrements the &lt;code&gt;refcount&lt;/code&gt; of sessions accessing the relation. However, there are exceptions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When &lt;code&gt;refcount&lt;/code&gt; is 0, &lt;code&gt;MemoryContextDeleteChildren()&lt;/code&gt; is executed. This function deletes the mctx related to &lt;em&gt;child partition descriptors&lt;/em&gt;, which does release memory.&lt;/li&gt;
&lt;li&gt;When &lt;code&gt;refcount&lt;/code&gt; is 0 and the macro &lt;code&gt;RELCACHE_FORCE_RELEASE&lt;/code&gt; is defined, the &lt;code&gt;RelationClearRelation()&lt;/code&gt; function deletes the hash table entry. This step does not release memory. The &lt;code&gt;RELCACHE_FORCE_RELEASE&lt;/code&gt; macro was not found (only available with explicit compilation?).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;relcache is not completely without memory release logic, but the trigger conditions are relatively strict, and the freed memory is not all of the relcache memory.&lt;/em&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;syscache/catcache
 &lt;div id="syscachecatcache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#syscachecatcache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;CatCache caches tuples from system tables. Built on top of CatCache is another layer called SysCache (KV interface). Essentially, CatCache and SysCache together reorganize data from system tables in memory using a KV approach.
syscache/catcache is more complex. Here I&amp;rsquo;ll briefly extract some easily interpretable content, mainly to understand the cached content and loading mechanism of syscache. For deeper source code analysis, refer to &lt;a href="https://blog.csdn.net/weixin_45644897/article/details/121254012" target="_blank" rel="noreferrer"&gt;PostgreSQL Source Analysis — Storage Management — Memory Management (3)&lt;/a&gt; and &lt;a href="https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/" target="_blank" rel="noreferrer"&gt;PostgreSQL RelCache and SysCache Caches&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;catcache struct&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; catcache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			id;				&lt;span style="color:#75715e"&gt;// cache id, defined in syscache.h
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			cc_nbuckets;	&lt;span style="color:#75715e"&gt;// number of hash buckets for this cache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TupleDesc	cc_tupdesc;		&lt;span style="color:#75715e"&gt;// tuple descriptor, copied from reldesc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cc_relname;		&lt;span style="color:#75715e"&gt;// system table name corresponding to the tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			cc_reloid;		&lt;span style="color:#75715e"&gt;// system table OID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			cc_indexoid;	&lt;span style="color:#75715e"&gt;// index OID for cache key
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		cc_relisshared; &lt;span style="color:#75715e"&gt;// is the table shared across databases?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Statistics used by catcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#ifdef CATCACHE_STATS
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;		cc_searches;	&lt;span style="color:#75715e"&gt;// number of queries against this catcache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;		cc_hits;		&lt;span style="color:#75715e"&gt;// hit count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;long&lt;/span&gt;		cc_neg_hits;	&lt;span style="color:#75715e"&gt;// negative entry hit count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#endif
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} CatCache;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;catcache entry&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; catctup
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			ct_magic;		&lt;span style="color:#75715e"&gt;// identifies this catctup entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CT_MAGIC 0x57261502
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint32		hash_value;		&lt;span style="color:#75715e"&gt;// hash key value for this tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Dead tuples won&amp;#39;t be returned, but will be removed from catcache when refcount reaches zero
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			refcount;		&lt;span style="color:#75715e"&gt;// tuple refcount, indicates whether it&amp;#39;s being accessed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		dead;			&lt;span style="color:#75715e"&gt;// dead tuple, but not yet cleaned up
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		negative;		&lt;span style="color:#75715e"&gt;// is this a negative cache entry?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HeapTupleData tuple;		&lt;span style="color:#75715e"&gt;// tuple header structure
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CatCache &lt;span style="color:#f92672"&gt;*&lt;/span&gt;my_cache;		&lt;span style="color:#75715e"&gt;// link to the catcache this tuple belongs to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} CatCTup;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;SearchCatCacheMiss() Function&lt;/strong&gt;
&lt;code&gt;SearchCatCacheMiss()&lt;/code&gt; is the main function for catcache hit/miss, and after a miss it accesses tuples from the dictionary.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; pg_noinline HeapTuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SearchCatCacheMiss&lt;/span&gt;(CatCache &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cache,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nkeys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 uint32 hashValue,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Index hashIndex,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v2,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v3,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Datum v4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ScanKeyData cur_skey[CATCACHE_MAXKEYS];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Relation	relation;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SysScanDesc scandesc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HeapTuple	ntp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CatCTup &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ct;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Datum		arguments[CATCACHE_MAXKEYS];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Tuple not found in cache, so try to find it directly from the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If found, add it to cache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If not found, add a negative cache entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	relation &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;table_open&lt;/span&gt;(cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_reloid, AccessShareLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	scandesc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;systable_beginscan&lt;/span&gt;(relation,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_indexoid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#a6e22e"&gt;IndexScanOK&lt;/span&gt;(cache, cur_skey),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 NULL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 nkeys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 cur_skey);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ct &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When tuple is valid, create an entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleIsValid&lt;/span&gt;(ntp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;systable_getnext&lt;/span&gt;(scandesc)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ct &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CatalogCacheCreateEntry&lt;/span&gt;(cache, ntp, arguments,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 hashValue, hashIndex,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 false); &lt;span style="color:#75715e"&gt;// Create an entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Immediately increment refcount
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResourceOwnerEnlargeCatCacheRefs&lt;/span&gt;(CurrentResourceOwner);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ct&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;refcount&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResourceOwnerRememberCatCacheRef&lt;/span&gt;(CurrentResourceOwner, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ct&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;					&lt;span style="color:#75715e"&gt;/* assume only one match */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;systable_endscan&lt;/span&gt;(scandesc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;table_close&lt;/span&gt;(relation, AccessShareLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// If no tuple found, create a negative cache entry (a dummy tuple)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// The dummy tuple has key columns, all others are null
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// During startup, the invalidation mechanism is not active and entries
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// cannot be cleaned up if a tuple is actually created later
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	// So during this phase, negative entries are not created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (ct &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL) &lt;span style="color:#75715e"&gt;// If no tuple found, enter the following logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;()) &lt;span style="color:#75715e"&gt;// Return NULL directly if in startup phase
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ct &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CatalogCacheCreateEntry&lt;/span&gt;(cache, NULL, arguments,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 hashValue, hashIndex,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 true); &lt;span style="color:#75715e"&gt;// Create entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CACHE_elog&lt;/span&gt;(DEBUG2, &lt;span style="color:#e6db74"&gt;&amp;#34;SearchCatCache(%s): Contains %d/%d tuples&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_relname, cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_ntup, CacheHdr&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ch_ntup);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CACHE_elog&lt;/span&gt;(DEBUG2, &lt;span style="color:#e6db74"&gt;&amp;#34;SearchCatCache(%s): put neg entry in bucket %d&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 cache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_relname, hashIndex);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Negative entries are not returned to caller, refcount remains 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ct&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;tuple;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The dummy tuple (&lt;em&gt;negative cache entry&lt;/em&gt;) here is brilliant — caching a non-existent tuple in catcache prevents needing to query the data dictionary again on the next access, avoiding repeated pointless data dictionary lookups.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Cache Validation Messages
 &lt;div id="cache-validation-messages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cache-validation-messages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When a tuple is updated or deleted, due to transaction visibility rules, these tuples that become invisible after the transaction ends need to be communicated to caches, invalidating the cached tuples so they can be reloaded on the next read. Similarly, when new tuples are inserted, negative cache entries in caches may also need to be flushed to match the new tuples. One common scenario is DDL — DDL may cause certain tuples in the metadata to become invalid, at which point cache validation messages need to be sent to various private caches to clean up cache entries.
This cache validation mechanism applies to managing private cache pools like syscache and relcache. Since idle backends won&amp;rsquo;t read sinval events, messages must be actively sent to allow lagging backends to &amp;ldquo;catch up.&amp;rdquo; When completing a transaction, invalidation events must be broadcast to other backends via the SI message queue.&lt;/p&gt;
&lt;p&gt;The source code is split into two parts: &lt;code&gt;sinval&lt;/code&gt; and &lt;code&gt;inval&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Invalidation interface: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/include/utils/inval.h;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/include/utils/inval.h&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation dispatch: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/backend/utils/cache/inval.c;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/backend/utils/cache/inval.c&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing interface: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/include/storage/sinval.h;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/include/storage/sinval.h&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing dispatch: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/backend/storage/ipc/sinval.c;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/backend/storage/ipc/sinval.c&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing data structures interface: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/include/storage/sinvaladt.h;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/include/storage/sinvaladt.h&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Invalidation message sharing data structures: &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;f=src/backend/storage/ipc/sinvaladt.c;hb=HEAD" target="_blank" rel="noreferrer"&gt;src/backend/storage/ipc/sinvaladt.c&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In &lt;code&gt;src/backend/utils/cache/inval.c&lt;/code&gt;, the shared-invalidation message structure is defined:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	int8		id;				&lt;span style="color:#75715e"&gt;/* type field --- must be first */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalCatcacheMsg cc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalCatalogMsg cat;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalRelcacheMsg rc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalSmgrMsg sm;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalRelmapMsg rm;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SharedInvalSnapshotMsg sn;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} SharedInvalidationMessage;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Shared-invalidation messages include the following types:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Invalidate a specific catcache entry&lt;/li&gt;
&lt;li&gt;Invalidate the entire catcache entry for a particular system catalog&lt;/li&gt;
&lt;li&gt;Invalidate a specific relcache entry&lt;/li&gt;
&lt;li&gt;Invalidate ALL relcache entries&lt;/li&gt;
&lt;li&gt;Invalidate the smgr cache entry for a particular physical relation&lt;/li&gt;
&lt;li&gt;Invalidate a mapped-relation&lt;/li&gt;
&lt;li&gt;Invalidate saved snapshots that scanned a relation&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;&lt;p&gt;Messages are located in the shared memory queue until all other processes read them. Normally, receiving processes only read messages at specific times, so if a receiving process is idle (not processing any user requests) or busy doing other things such that they don&amp;rsquo;t have time to read these messages, the messages may remain in shared memory indefinitely. In unfortunate situations, if this shared memory space is no longer available for processes to store new messages, that process will have to take on the cleanup task. (In practice, this cleanup is done proactively, so space rarely runs out.) To discard old messages, it must be ensured that all other processes have read them. If some processes cannot do so for the above reasons, it must explicitly signal the lagging processes to catch up. Once the lagging processes have caught up, these messages can be freely discarded.
When processing a message, it first checks whether the catalog tuple specified in the message is currently in the cache (the message also specifies the syscache identifier). If so, it is removed from the cache&amp;rsquo;s hash table. The next time that tuple is requested, it will be re-read from the underlying catalog table and added to the hash table, so subsequent accesses will read the new value. If a process has already locked a particular database object preventing concurrent processes from modifying it, it can continue using the cached tuple until the lock is released.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;xxCache Issues Summary
 &lt;div id="xxcache-issues-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#xxcache-issues-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are many types of xxCache, among which the more notable ones are plancache, relcache, and syscache. These caches belong to private memory and exist in each backend process. These caches have no LRU mechanism to evict stale data; they use invalidation messages to clean up globally-unneeded snapshots and metadata information, such as when an object is deleted.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;relcache is the place most likely to occupy significant memory. relcache loads metadata information, and during initialization it reads *.init files to speed up loading metadata into relcache. Later, when other metadata needs to be read, loading also occurs.&lt;/li&gt;
&lt;li&gt;catcache caches tuple information from the data dictionary. syscache is one layer above catcache — they can be understood as jointly implementing this data dictionary cache. If a tuple does not exist, a negative entry is created to avoid accessing the data dictionary again on the next visit. Similarly, a catcache miss will also read tuples from the data dictionary.&lt;/li&gt;
&lt;li&gt;Cache validation messages exist to inform caches that cached tuples and snapshot information have become stale. They can invalidate corresponding relcache and catcache entries. Entries are removed from the cache&amp;rsquo;s hash table, which releases memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Since the cache memory release mechanisms are very limited, when there is a lot of metadata (many tables, partition tables), relcache and catcache can consume a lot of memory — and this can happen for every backend.
&lt;em&gt;Possible solutions&lt;/em&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Global cache. Like Oracle&amp;rsquo;s dictionary cache, cache in one place with shared access. For example, &lt;a href="https://www.alibabacloud.com/help/en/polardb/polardb-for-postgresql/global-relcache-1" target="_blank" rel="noreferrer"&gt;PolarDB&amp;rsquo;s Global RelCache&lt;/a&gt; has already implemented this functionality.&lt;/li&gt;
&lt;li&gt;LRU. An LRU mechanism suitable for caches is needed to separate hot and cold ends, cleaning excessively old cache entries from the hash table. This might require cache limit parameters to restrict cache size — ideally one per cache&amp;hellip;&lt;/li&gt;
&lt;li&gt;Threading mode. Memory is shared and accessed by all threads — a natural advantage.&lt;/li&gt;
&lt;li&gt;Periodically disconnect long connections. All of the above are just wishful thinking.&lt;/li&gt;
&lt;li&gt;Don&amp;rsquo;t create too many tables or partitions (note that in PostgreSQL, partitions are also tables).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Memory Contexts
 &lt;div id="memory-contexts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-contexts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL manages memory through the memory context mechanism. I previously did a &lt;a href="https://blog.csdn.net/qq_40687433/article/details/134796339?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;translation about memory contexts&lt;/a&gt;, roughly summarized as follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;C language requires explicit memory deallocation. To reduce the risk of memory leaks, PostgreSQL implemented memory contexts to manage private memory.&lt;/li&gt;
&lt;li&gt;Memory contexts do not require freeing memory after each use; instead, memory is released by deleting a particular context.&lt;/li&gt;
&lt;li&gt;Memory contexts form a hierarchical structure — releasing a parent context recursively deletes all child contexts.&lt;/li&gt;
&lt;li&gt;Aside from debugging, observing memory context usage is quite difficult. Starting from PG14, the &lt;code&gt;pg_backend_memory_contexts&lt;/code&gt; view can observe the current memory context usage of the current session.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Timing of memory context creation during SQL operations:



&lt;img src="https://lastdba.com/img/csdn/b269e3547cbf.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In PostgreSQL, all memory allocation, deallocation, and resetting is done within memory contexts, so the &lt;code&gt;malloc()&lt;/code&gt;, &lt;code&gt;realloc()&lt;/code&gt;, and &lt;code&gt;free()&lt;/code&gt; system call functions are not used directly. Instead, &lt;code&gt;palloc()&lt;/code&gt;, &lt;code&gt;repalloc()&lt;/code&gt;, and &lt;code&gt;pfree()&lt;/code&gt; are used for memory allocation, reallocation, and deallocation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;C Library Memory Functions&lt;/strong&gt;
&lt;a href="https://www.geeksforgeeks.org/dynamic-memory-allocation-in-c-using-malloc-calloc-free-and-realloc/" target="_blank" rel="noreferrer"&gt;C library dynamic memory allocation functions&lt;/a&gt; include:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;malloc(): The C library&amp;rsquo;s malloc() function (memory allocation) is used to allocate large blocks of memory.&lt;/li&gt;
&lt;li&gt;calloc(): The C library&amp;rsquo;s calloc() function (contiguous allocation) is used to allocate contiguous memory.&lt;/li&gt;
&lt;li&gt;free(): Used to release memory. malloc() and calloc() do not release memory; after dynamic memory allocation, free() must be used to release it.&lt;/li&gt;
&lt;li&gt;realloc(): Used for memory re-allocation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There is also a C library function &lt;a href="https://www.geeksforgeeks.org/memset-c-example/" target="_blank" rel="noreferrer"&gt;memset()&lt;/a&gt;, used to fill a memory block with a specific value.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL-Defined Memory Functions&lt;/strong&gt;
The functions actually heavily used in PostgreSQL source code for memory allocation, deallocation, etc., are &lt;code&gt;palloc()&lt;/code&gt;, &lt;code&gt;palloc0()&lt;/code&gt;, &lt;code&gt;repalloc()&lt;/code&gt;, and &lt;code&gt;pfree()&lt;/code&gt;. They mostly do not directly interact with OS memory (C library functions); only in certain cases do they call C library memory functions. This essentially adds a layer of protection over OS memory operations, with PostgreSQL handling small memory operations on its own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;palloc()&lt;/strong&gt;:
&lt;code&gt;palloc()&lt;/code&gt; primarily calls the &lt;code&gt;alloc&lt;/code&gt; method of &lt;code&gt;MemoryContext&lt;/code&gt;. &lt;code&gt;alloc&lt;/code&gt; corresponds to calling the &lt;code&gt;MemoryContextAlloc&lt;/code&gt; function, which in turn calls the &lt;code&gt;AllocSetAlloc&lt;/code&gt; function specified in the methods field of the current memory context.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;palloc&lt;/span&gt;(Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* duplicates MemoryContextAlloc to avoid increased overhead */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext context &lt;span style="color:#f92672"&gt;=&lt;/span&gt; CurrentMemoryContext;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ret &lt;span style="color:#f92672"&gt;=&lt;/span&gt; context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;alloc&lt;/span&gt;(context, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;....
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;palloc0()&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;palloc0&lt;/span&gt;(Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ret &lt;span style="color:#f92672"&gt;=&lt;/span&gt; context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;alloc&lt;/span&gt;(context, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemSetAligned&lt;/span&gt;(ret, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;MemSetAligned&lt;/code&gt; is macro-defined and actually calls C library &lt;code&gt;memset&lt;/code&gt; for memory filling, but &lt;code&gt;MemSetAligned&lt;/code&gt; passes &lt;code&gt;0&lt;/code&gt; as the value.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MemSetAligned(start, val, len)\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	memset(_start, _val, _len); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;...	&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Compared to &lt;code&gt;palloc&lt;/code&gt;, &lt;code&gt;palloc0&lt;/code&gt; not only calls &lt;code&gt;alloc(context, size)&lt;/code&gt; but also zeroes out the memory content.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;repalloc()&lt;/strong&gt;:
&lt;code&gt;repalloc()&lt;/code&gt; primarily calls the &lt;code&gt;realloc&lt;/code&gt; method of &lt;code&gt;MemoryContext&lt;/code&gt;. The &lt;code&gt;realloc&lt;/code&gt; function pointer corresponds to the &lt;code&gt;AllocSetRealloc&lt;/code&gt; function.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * repalloc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		Adjust the size of a previously allocated chunk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;repalloc&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pointer, Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext context &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetMemoryChunkContext&lt;/span&gt;(pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ret &lt;span style="color:#f92672"&gt;=&lt;/span&gt; context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;realloc&lt;/span&gt;(context, pointer, size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; ret;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;pfree()&lt;/strong&gt;:
pfree calls the &lt;code&gt;free_p&lt;/code&gt; function pointer in the methods field of the memory context to which the memory chunk belongs, to release the memory chunk&amp;rsquo;s space. Currently, in PostgreSQL, the &lt;code&gt;free_p&lt;/code&gt; pointer actually points to the &lt;code&gt;AllocSetFree&lt;/code&gt; function.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * pfree
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		Release an allocated chunk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;pfree&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pointer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MemoryContext context &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetMemoryChunkContext&lt;/span&gt;(pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	context&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;methods&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;free_p&lt;/span&gt;(context, pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;VALGRIND_MEMPOOL_FREE&lt;/span&gt;(context, pointer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;AllocSetAlloc Memory Allocation&lt;/strong&gt;
Looking at the alloc method within, alloc ultimately points to the &lt;code&gt;AllocSetAlloc&lt;/code&gt; function. &lt;code&gt;AllocSetAlloc&lt;/code&gt; looks rather complex, but it becomes easier to understand when read in segments:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;AllocSetAlloc&lt;/span&gt;(MemoryContext context, Size size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	AllocSet	set &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocSet) context;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	AllocBlock	block;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	AllocChunk	chunk;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			fidx;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Size		chunk_size;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Size		blksize;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If requested memory exceeds the max chunk size, allocate an entire memory block
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (size &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;allocChunkLimit)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocBlock) &lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(blksize);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If requested memory is less than chunk size, check free list for available free chunks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	fidx &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocSetFreeIndex&lt;/span&gt;(size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	chunk &lt;span style="color:#f92672"&gt;=&lt;/span&gt; set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;freelist[fidx];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (chunk &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL) &lt;span style="color:#75715e"&gt;// There are chunks available in the free list
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(chunk&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;size &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; size);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;freelist[fidx] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocChunk) chunk&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;aset;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		chunk&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;aset &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) set;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;AllocChunkGetPointer&lt;/span&gt;(chunk);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If there&amp;#39;s space, try to place the chunk in the allocation block; if not, create a new block
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; ((block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; set&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;blocks) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Size		availspace &lt;span style="color:#f92672"&gt;=&lt;/span&gt; block&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;endptr &lt;span style="color:#f92672"&gt;-&lt;/span&gt; block&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;freeptr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (availspace &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; (chunk_size &lt;span style="color:#f92672"&gt;+&lt;/span&gt; ALLOC_CHUNKHDRSZ))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// No space, create a new block
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (block &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Size		required_size;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Requested block size is a power of 2, not exceeding maxBlockSize
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		required_size &lt;span style="color:#f92672"&gt;=&lt;/span&gt; chunk_size &lt;span style="color:#f92672"&gt;+&lt;/span&gt; ALLOC_BLOCKHDRSZ &lt;span style="color:#f92672"&gt;+&lt;/span&gt; ALLOC_CHUNKHDRSZ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (blksize &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; required_size)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			blksize &lt;span style="color:#f92672"&gt;&amp;lt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Use malloc to allocate the block, size is a power of 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		block &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (AllocBlock) &lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(blksize);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8731cbdd1398.png" alt="Alt text" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://smartkeyerror.com/PostgreSQL-MemoryContext" target="_blank" rel="noreferrer"&gt;https://smartkeyerror.com/PostgreSQL-MemoryContext&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;palloc() =&amp;gt; AllocSetAlloc()&lt;/code&gt; only calls &lt;code&gt;malloc()&lt;/code&gt; to request memory from the OS when the requested memory exceeds the chunk size limit or when there are no free blocks in the freelist. In all other cases, it takes existing free chunks from the freelist.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pfree()&lt;/code&gt; is similar (not demonstrated here):
&lt;code&gt;pfree() =&amp;gt; AllocSetFree()&lt;/code&gt; releases a specified memory chunk in a memory context. If the chunk to be freed is the only chunk in the memory block, &lt;code&gt;free()&lt;/code&gt; is called directly to release that memory block. Otherwise, the specified chunk is added to the freelist for the next allocation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Viewing Memory Context Size
 &lt;div id="viewing-memory-context-size" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#viewing-memory-context-size" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;PG14+: &lt;code&gt;pg_backend_memory_contexts&lt;/code&gt; view to directly inspect memory context memory within the database.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_backend_memory_contexts &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; used_bytes &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ident &lt;span style="color:#f92672"&gt;|&lt;/span&gt; parent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; total_bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; total_nblocks &lt;span style="color:#f92672"&gt;|&lt;/span&gt; free_bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; free_chunks &lt;span style="color:#f92672"&gt;|&lt;/span&gt; used_bytes 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------+-------+------------------+-------+-------------+---------------+------------+-------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; CacheMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1048576&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;508216&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;540360&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Timezones &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;104120&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2616&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;101504&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;97680&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12904&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84776&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ExecutorState &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PortalContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49208&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4424&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;44784&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WAL record construction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TopMemoryContext &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49768&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6360&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;43408&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;PG14+: &lt;code&gt;pg_log_backend_memory_contexts&lt;/code&gt; function outputs memory information to the log file, producing output similar to &lt;code&gt;MemoryContextStats(TopMemoryContext)&lt;/code&gt; log output.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_log_backend_memory_contexts(&lt;span style="color:#ae81ff"&gt;9293&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Universal — gdb &lt;code&gt;MemoryContextStats(TopMemoryContext)&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Use gdb to call &lt;code&gt;MemoryContextStats(TopMemoryContext)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;gdb 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; attach &lt;span style="color:#ae81ff"&gt;9293&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;(&lt;/span&gt;gdb&lt;span style="color:#f92672"&gt;)&lt;/span&gt; p MemoryContextStats&lt;span style="color:#f92672"&gt;(&lt;/span&gt;TopMemoryContext&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; void&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Log output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TopMemoryContext: &lt;span style="color:#ae81ff"&gt;97680&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;16856&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;80824&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TableSpace cache: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2088&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;6104&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RowDescriptionContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6888&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1304&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; MessageContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6888&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1304&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Operator class cache: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;552&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;7640&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Relcache by OID: &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;3504&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;12880&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; CacheMemoryContext: &lt;span style="color:#ae81ff"&gt;524288&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;90840&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;433448&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; index info: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;904&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1144&lt;/span&gt; used: pg_statistic_ext_relid_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; index info: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;824&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1224&lt;/span&gt; used: pg_database_oid_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; index info: &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;824&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1224&lt;/span&gt; used: pg_authid_rolname_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WAL record construction: &lt;span style="color:#ae81ff"&gt;49768&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6360&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;43408&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PrivateRefCount: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2616&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;5576&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; MdSmgr: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7592&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LOCALLOCK hash: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;552&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;7640&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Timezones: &lt;span style="color:#ae81ff"&gt;104120&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2616&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;101504&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ErrorContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7928&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; used&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/cac547b38cb3.png" alt="Image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;references
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;src/backend/utils/mmgr/mcxt.c&lt;/p&gt;
&lt;p&gt;src/backend/utils/mmgr/README&lt;/p&gt;
&lt;p&gt;&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql02.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/runtime-config-resource.htm" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/runtime-config-resource.htm&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/kernel-resources.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/kernel-resources.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/weixin_45644897/article/details/121340327" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_45644897/article/details/121340327&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://help.aliyun.com/zh/polardb/polardb-for-postgresql/global-cache" target="_blank" rel="noreferrer"&gt;https://help.aliyun.com/zh/polardb/polardb-for-postgresql/global-cache&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_cache02.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_cache02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/" target="_blank" rel="noreferrer"&gt;https://blog.japinli.top/2022/07/postgres-relcache-and-syscache/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://amitlan.com/2019/06/14/caches-inval.html" target="_blank" rel="noreferrer"&gt;https://amitlan.com/2019/06/14/caches-inval.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/memory-context-for-postgresql-memory-management/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/memory-context-for-postgresql-memory-management/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.geeksforgeeks.org/dynamic-memory-allocation-in-c-using-malloc-calloc-free-and-realloc/" target="_blank" rel="noreferrer"&gt;https://www.geeksforgeeks.org/dynamic-memory-allocation-in-c-using-malloc-calloc-free-and-realloc/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr01.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr01.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr02.html" target="_blank" rel="noreferrer"&gt;https://www.cnblogs.com/feishujun/p/PostgreSQLSourceAnalysis_mmgr02.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://smartkeyerror.com/PostgreSQL-MemoryContext" target="_blank" rel="noreferrer"&gt;https://smartkeyerror.com/PostgreSQL-MemoryContext&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jnidzwetzki.github.io/2022/05/28/postgres-memory-context.html" target="_blank" rel="noreferrer"&gt;https://jnidzwetzki.github.io/2022/05/28/postgres-memory-context.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgcon.org/2019/schedule/attachments/514_introduction-memory-contexts.pdf&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Brief Analysis of PostgreSQL TRUNCATE</title><link>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-truncate/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-brief-analysis-of-postgresql-truncate/</guid><description>&lt;h2 class="relative group"&gt;Command Options
 &lt;div id="command-options" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#command-options" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; [ &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; ] [ &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; ] name [ &lt;span style="color:#f92672"&gt;*&lt;/span&gt; ] [, ... ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [ &lt;span style="color:#66d9ef"&gt;RESTART&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONTINUE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; ] [ &lt;span style="color:#66d9ef"&gt;CASCADE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;RESTRICT&lt;/span&gt; ]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;1&lt;/strong&gt;. &lt;code&gt;ONLY&lt;/code&gt;: truncate only the specified table. When a table has inheritance children or child partitions, by default they are truncated together; ONLY can truncate just the inheritance parent table. Partitioned parent tables cannot specify ONLY.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Cannot truncate only a partitioned parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;only&lt;/span&gt; parttable;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42809&lt;/span&gt;: cannot &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;only&lt;/span&gt; a partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: &lt;span style="color:#66d9ef"&gt;Do&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; specify the &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; keyword, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; use &lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; the partitions directly.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ExecuteTruncate, tablecmds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1655&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- truncate only the inheritance parent table, only the parent is cleaned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;only&lt;/span&gt; parenttable;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; parenttable &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; childtable &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Directly truncate the inheritance parent table, child tables are also cleaned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; parenttable;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; parenttable &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2&lt;/strong&gt;. &lt;code&gt;RESTART IDENTITY&lt;/code&gt; &lt;code&gt;CONTINUE IDENTITY&lt;/code&gt;: whether to reset sequences on columns. Default is CONTINUE.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Command Options
 &lt;div id="command-options" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#command-options" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; [ &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; ] [ &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; ] name [ &lt;span style="color:#f92672"&gt;*&lt;/span&gt; ] [, ... ]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [ &lt;span style="color:#66d9ef"&gt;RESTART&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONTINUE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt; ] [ &lt;span style="color:#66d9ef"&gt;CASCADE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;RESTRICT&lt;/span&gt; ]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;1&lt;/strong&gt;. &lt;code&gt;ONLY&lt;/code&gt;: truncate only the specified table. When a table has inheritance children or child partitions, by default they are truncated together; ONLY can truncate just the inheritance parent table. Partitioned parent tables cannot specify ONLY.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Cannot truncate only a partitioned parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;only&lt;/span&gt; parttable;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42809&lt;/span&gt;: cannot &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;only&lt;/span&gt; a partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: &lt;span style="color:#66d9ef"&gt;Do&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; specify the &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; keyword, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; use &lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; the partitions directly.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ExecuteTruncate, tablecmds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1655&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- truncate only the inheritance parent table, only the parent is cleaned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;only&lt;/span&gt; parenttable;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; parenttable &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; childtable &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Directly truncate the inheritance parent table, child tables are also cleaned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; parenttable;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; parenttable &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2&lt;/strong&gt;. &lt;code&gt;RESTART IDENTITY&lt;/code&gt; &lt;code&gt;CONTINUE IDENTITY&lt;/code&gt;: whether to reset sequences on columns. Default is CONTINUE.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- bigserial creates a column sequence by default
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tableserial (a bigserial &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,b name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; tableserial;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tableserial&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+--------+-----------+----------+----------------------------------------+---------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;tableserial_a_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; b &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tableserial(b) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; md5(random()::text) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- seq current value is 1000
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; currval(&lt;span style="color:#e6db74"&gt;&amp;#39;tableserial_a_seq&amp;#39;&lt;/span&gt;::regclass);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; currval
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Direct truncate does not reset sequences by default
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tableserial;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; currval(&lt;span style="color:#e6db74"&gt;&amp;#39;tableserial_a_seq&amp;#39;&lt;/span&gt;::regclass) cur,nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;tableserial_a_seq&amp;#39;&lt;/span&gt;::regclass);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cur &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Explicitly specify RESTART IDENTITY to reset sequences
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tableserial &lt;span style="color:#66d9ef"&gt;RESTART&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Note: seq is reset on nextval
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; currval(&lt;span style="color:#e6db74"&gt;&amp;#39;tableserial_a_seq&amp;#39;&lt;/span&gt;::regclass) cur,nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;tableserial_a_seq&amp;#39;&lt;/span&gt;::regclass);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cur &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3&lt;/strong&gt;. &lt;code&gt;CASCADE&lt;/code&gt;: truncate the table and all foreign key referencing tables.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create primary table, foreign key table, and data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; pri_tab(id bigint &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,name varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; pri_tab &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;),(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;),(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; frn_tab(id bigint,&lt;span style="color:#66d9ef"&gt;FOREIGN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt; (id) &lt;span style="color:#66d9ef"&gt;REFERENCES&lt;/span&gt; pri_tab(id));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; frn_tab &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;),(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pri_tab;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Foreign key table frn_tab depends on pri_tab&amp;#39;s data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; frn_tab;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With foreign key references on the primary table, CASCADE is required on the foreign key table, otherwise truncate fails
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; pri_tab ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;A000: cannot &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; referenced &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;foreign&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;frn_tab&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;references&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pri_tab&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: &lt;span style="color:#66d9ef"&gt;Truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;frn_tab&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; the same time, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; use &lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; ... &lt;span style="color:#66d9ef"&gt;CASCADE&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: heap_truncate_check_FKs, heap.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3427&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Clear foreign key constrained tables together
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; pri_tab &lt;span style="color:#66d9ef"&gt;cascade&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NOTICE: &lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; cascades &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;frn_tab&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ExecuteTruncateGuts, tablecmds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1725&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pri_tab;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; frn_tab;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since the foreign key table depends on the primary table&amp;rsquo;s data, you cannot directly truncate the primary table — you must add CASCADE, at which point the foreign key table is also cleared along with the primary table.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4&lt;/strong&gt;. &lt;code&gt;RESTRICT&lt;/code&gt;
Whether to clear foreign key tables. Not very useful — it&amp;rsquo;s the default option, and behavior is the same whether specified or not. Use CASCADE to clear associated foreign key tables.&lt;/p&gt;

&lt;h2 class="relative group"&gt;MVCC / Transaction
 &lt;div id="mvcc--transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mvcc--transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The PG official documentation has this passage:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;code&gt;TRUNCATE&lt;/code&gt; is not MVCC-safe. After truncation, the table will appear empty to concurrent transactions, if they are using a snapshot taken before the truncation occurred.
&lt;code&gt;TRUNCATE&lt;/code&gt; is transaction-safe with respect to the data in the tables: the truncation will be safely rolled back if the surrounding transaction does not commit.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;transaction-safe means it can be placed inside a transaction block and can be rolled back.
Rolling back truncate:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; t1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;not MVCC-safe means: if a session takes a snapshot before truncate, and a truncate occurs during the snapshot period, that snapshot can read the result after truncate. This does not conform to MVCC.
However, this isn&amp;rsquo;t a big issue in session scenarios, because truncate takes an 8-level lock (AccessExclusiveLock). If the snapshot hasn&amp;rsquo;t ended, at minimum there&amp;rsquo;s a read shared lock on the table, so truncate won&amp;rsquo;t execute.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This will only be an issue for a transaction that did not access the table in question before the DDL command started — any transaction that has done so would hold at least an &lt;code&gt;ACCESS SHARE&lt;/code&gt; table lock, which would block the DDL command until that transaction completes.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Feature Updates
 &lt;div id="feature-updates" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#feature-updates" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c1c4036557be.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;p&gt;There aren&amp;rsquo;t many truncate feature updates. Just note that PG14 added support for truncating foreign tables. The prerequisite for truncating foreign tables is that the FDW must support the TRUNCATE API.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Also it extends postgres_fdw so that it can issue TRUNCATE command to foreign servers, by adding new routine for that TRUNCATE API.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Functional Differences Between pg TRUNCATE and Other Databases
 &lt;div id="functional-differences-between-pg-truncate-and-other-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#functional-differences-between-pg-truncate-and-other-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8b204baa363f.png" alt="在这里插入图片描述" /&gt;



&lt;img src="https://lastdba.com/img/csdn/b7ebb636b6b2.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;p&gt;TRUNCATE being fast and an 8-level lock are already well-known traits. Compared to other databases, PG can also: &lt;strong&gt;choose whether to reset sequences&lt;/strong&gt; (&lt;code&gt;RESTART IDENTITY&lt;/code&gt; &lt;code&gt;CONTINUE IDENTITY&lt;/code&gt;), &lt;strong&gt;rollback&lt;/strong&gt;, and has &lt;strong&gt;simple authorization&lt;/strong&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;What TRUNCATE Does
 &lt;div id="what-truncate-does" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-truncate-does" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl(a int);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; lzl_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; sequence lzl_seq &lt;span style="color:#66d9ef"&gt;start&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl_seq&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--select pg_relation_filepath(&amp;#39;lzl&amp;#39;);&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- db path
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_database &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; datname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzldb&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;418679&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When first created, each rel&amp;#39;s oid = relfilenode
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,oid,relfilenode,relkind &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relfilenode &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relkind
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------+-------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428363&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428363&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428366&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428366&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; i
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; S
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,oid,relfilenode,relkind &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relfilenode &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relkind
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------+-------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428363&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428370&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428366&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; i
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; S
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After truncate, table and index were rebuilt, but sequence was not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;RESTART&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,oid,relfilenode,relkind &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relfilenode &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relkind
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------+-------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428363&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428372&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428366&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428373&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; i
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; S
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Even with explicit RESTART, sequence was still not rebuilt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; sequence lzl_seq &lt;span style="color:#66d9ef"&gt;restart&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; SEQUENCE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,oid,relfilenode,relkind &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relfilenode &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relkind
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------+-------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428363&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428372&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428366&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428373&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; i
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428374&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; S
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Explicitly restarting the sequence DOES rebuild it&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;truncate ... RESTART IDENTITY&lt;/code&gt; did not rebuild our sequence, while &lt;code&gt;alter sequence lzl_seq restart&lt;/code&gt; did rebuild the sequence. It seems the understanding of &lt;code&gt;RESTART IDENTITY&lt;/code&gt; was wrong. Let&amp;rsquo;s look at the official documentation for &lt;code&gt;RESTART IDENTITY&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Automatically restart sequences owned by columns of the truncated table(s).&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The sequence must be &lt;code&gt;owned by&lt;/code&gt; a column on the table — note: not &lt;code&gt;owner to&lt;/code&gt;. Although &lt;code&gt;\d&lt;/code&gt; shows sequences on the table, they may not belong to the table.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzl&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+-----------+----------+------------------------------+---------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Use &lt;code&gt;owned by&lt;/code&gt; to modify the sequence&amp;rsquo;s owning table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; SEQUENCE lzl_seq OWNED &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; lzl.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; SEQUENCE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check sequence owner information
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; s.relname &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; seq, n.nspname &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; sch, t.relname &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; tab, a.attname &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; col
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_class s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_depend d &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; d.objid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;s.oid &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; d.classid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;pg_class&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; d.refclassid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;pg_class&amp;#39;&lt;/span&gt;::regclass
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_class t &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; t.oid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.refobjid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_namespace n &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; n.oid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;t.relnamespace
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; pg_attribute a &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; a.attrelid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;t.oid &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; a.attnum&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.refobjsubid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; s.relkind&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;S&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; d.deptype&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sch &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tab &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+--------+-------------+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableserial_a_seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tableserial &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;RESTART&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IDENTITY&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,oid,relfilenode,relkind &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relfilenode &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relkind
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------+-------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428363&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428375&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428366&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428376&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; i
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_seq &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428367&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;428377&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; S&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When a sequence is &lt;code&gt;owned by&lt;/code&gt; a column on the table, explicitly specifying &lt;code&gt;RESTART IDENTITY&lt;/code&gt; with truncate will restart that sequence, which also rebuilds the sequence. &lt;strong&gt;Sequences created via serial/bigserial are owned by the table and are dropped when the table is dropped; sequences not owned by a table are not dropped when the table is dropped&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Summary of truncate rebuild characteristics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Direct &lt;code&gt;truncate table&lt;/code&gt; rebuilds the table and indexes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;truncate table&lt;/code&gt; + &lt;code&gt;RESTART IDENTITY&lt;/code&gt; rebuilds (i.e., restarts) sequences that belong to this table. If a sequence doesn&amp;rsquo;t belong to this table, even if the column&amp;rsquo;s default is associated with the sequence, the sequence won&amp;rsquo;t be rebuilt&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;TRUNCATE is also a utility command, and the entry function can be found quickly.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;ExecuteTruncate&lt;/code&gt; in &lt;code&gt;src/backend/commands/tablecmds.c&lt;/code&gt; is the entry function. The comments already explain that truncate must acquire an exclusive lock, check permissions and relation validity, and recursively check all tables that need to be truncated.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExecuteTruncate&lt;/span&gt;(TruncateStmt &lt;span style="color:#f92672"&gt;*&lt;/span&gt;stmt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;* Open, exclusive-lock, and check all the explicitly-specified relations
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, stmt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relations)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LOCKMODE lockmode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; AccessExclusiveLock; &lt;span style="color:#75715e"&gt;// Level 8 lock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;rel &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;table_open&lt;/span&gt;(myrelid, NoLock); &lt;span style="color:#75715e"&gt;// Open table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExecuteTruncate&lt;/span&gt;(TruncateStmt &lt;span style="color:#f92672"&gt;*&lt;/span&gt;stmt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, stmt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relations)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		LOCKMODE	lockmode &lt;span style="color:#f92672"&gt;=&lt;/span&gt; AccessExclusiveLock; &lt;span style="color:#75715e"&gt;// Level 8 lock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* open the relation, we already hold a lock on it */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		rel &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;table_open&lt;/span&gt;(myrelid, NoLock); &lt;span style="color:#75715e"&gt;// Open table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;truncate_check_activity&lt;/span&gt;(rel); &lt;span style="color:#75715e"&gt;// Even with the lock, verify it&amp;#39;s not in use
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (recurse) &lt;span style="color:#75715e"&gt;// Recursive execution
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			children &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;find_all_inheritors&lt;/span&gt;(myrelid, lockmode, NULL); &lt;span style="color:#75715e"&gt;// Find all inheritance children
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(child, children)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#75715e"&gt;// Above only checked the parent table, recursion checks children
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;truncate_check_rel&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;RelationGetRelid&lt;/span&gt;(rel), rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;truncate_check_activity&lt;/span&gt;(rel);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				rels &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;lappend&lt;/span&gt;(rels, rel); &lt;span style="color:#75715e"&gt;// Add to the list of rels to truncate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				relids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;lappend_oid&lt;/span&gt;(relids, childrelid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Recursion ends
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// truncate only on partitioned parent table? error directly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_PARTITIONED_TABLE)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(ERROR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_WRONG_OBJECT_TYPE),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;cannot truncate only a partitioned table&amp;#34;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					 &lt;span style="color:#a6e22e"&gt;errhint&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;Do not specify the ONLY keyword, or use TRUNCATE ONLY on the partitions directly.&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Main function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;ExecuteTruncateGuts&lt;/span&gt;(rels, relids, relids_logged,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						stmt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;behavior, stmt&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;restart_seqs);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* And close the rels */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, rels)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Relation	rel &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (Relation) &lt;span style="color:#a6e22e"&gt;lfirst&lt;/span&gt;(cell);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;table_close&lt;/span&gt;(rel, NoLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;ExecuteTruncateGuts&lt;/code&gt; is called not only by the TRUNCATE command but also by the subscription side (publication/subscription can synchronize TRUNCATE).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ExecuteTruncateGuts&lt;/span&gt;(List &lt;span style="color:#f92672"&gt;*&lt;/span&gt;explicit_rels,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					List &lt;span style="color:#f92672"&gt;*&lt;/span&gt;relids,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					List &lt;span style="color:#f92672"&gt;*&lt;/span&gt;relids_logged,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					DropBehavior behavior, &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; restart_seqs)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	rels &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;list_copy&lt;/span&gt;(explicit_rels);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (behavior &lt;span style="color:#f92672"&gt;==&lt;/span&gt; DROP_CASCADE) &lt;span style="color:#75715e"&gt;// If CASCADE option specified, extract all referencing relations
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			newrelids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;heap_truncate_find_FKs&lt;/span&gt;(relids); &lt;span style="color:#75715e"&gt;// Find FKs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (newrelids &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NIL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;			&lt;span style="color:#75715e"&gt;/* nothing else to add */&lt;/span&gt; &lt;span style="color:#75715e"&gt;// No rels, exit directly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, newrelids)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				rel &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;table_open&lt;/span&gt;(relid, AccessExclusiveLock); &lt;span style="color:#75715e"&gt;// All rels acquire AccessExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(NOTICE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;truncate cascades to table &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								&lt;span style="color:#a6e22e"&gt;RelationGetRelationName&lt;/span&gt;(rel))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;truncate_check_rel&lt;/span&gt;(relid, rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel); &lt;span style="color:#75715e"&gt;// Check if it&amp;#39;s a truncatable object — must be a data-storing table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;truncate_check_perms&lt;/span&gt;(relid, rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel); &lt;span style="color:#75715e"&gt;// Check permissions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;truncate_check_activity&lt;/span&gt;(rel); &lt;span style="color:#75715e"&gt;// Check if in use
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (restart_seqs) &lt;span style="color:#75715e"&gt;// Handle restart seq
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, rels)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			Relation	rel &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (Relation) &lt;span style="color:#a6e22e"&gt;lfirst&lt;/span&gt;(cell);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			List	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;seqlist &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;getOwnedSequences&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;RelationGetRelid&lt;/span&gt;(rel));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Only check sequence permissions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;pg_class_ownercheck&lt;/span&gt;(seq_relid, &lt;span style="color:#a6e22e"&gt;GetUserId&lt;/span&gt;()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;aclcheck_error&lt;/span&gt;(ACLCHECK_NOT_OWNER, OBJECT_SEQUENCE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 &lt;span style="color:#a6e22e"&gt;RelationGetRelationName&lt;/span&gt;(seq_rel));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Execute all BEFORE TRUNCATE triggers
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, rels)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ExecBSTruncateTriggers&lt;/span&gt;(estate, resultRelInfo);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		resultRelInfo&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Begin the actual truncate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, rels)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If it&amp;#39;s a partitioned parent table, do nothing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_PARTITIONED_TABLE)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Handle foreign tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relkind &lt;span style="color:#f92672"&gt;==&lt;/span&gt; RELKIND_FOREIGN_TABLE)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 &lt;span style="color:#75715e"&gt;// If same transaction (may rollback), directly execute heap_truncate_one_rel without creating new relfilenode
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_createSubid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; mySubid &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_newRelfilenodeSubid &lt;span style="color:#f92672"&gt;==&lt;/span&gt; mySubid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Immediate, non-rollbackable truncation is OK */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;heap_truncate_one_rel&lt;/span&gt;(rel);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Set NewRelfilenode
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;RelationSetNewRelfilenode&lt;/span&gt;(rel, rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relpersistence);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			heap_relid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;RelationGetRelid&lt;/span&gt;(rel);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#75715e"&gt;// Same for toast
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			toast_relid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;reltoastrelid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;OidIsValid&lt;/span&gt;(toast_relid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				Relation	toastrel &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;relation_open&lt;/span&gt;(toast_relid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;													 AccessExclusiveLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;RelationSetNewRelfilenode&lt;/span&gt;(toastrel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 toastrel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rd_rel&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;relpersistence);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;table_close&lt;/span&gt;(toastrel, NoLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#75715e"&gt;// Rebuild indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;reindex_relation&lt;/span&gt;(heap_relid, REINDEX_REL_PROCESS_TOAST,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;reindex_params);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pgstat_count_truncate&lt;/span&gt;(rel); &lt;span style="color:#75715e"&gt;// Update pgstat truncate count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Reset sequences
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, seq_relids)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Oid			seq_relid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;lfirst_oid&lt;/span&gt;(cell);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ResetSequence&lt;/span&gt;(seq_relid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Write WAL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;list_length&lt;/span&gt;(relids_logged) &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Fire AFTER TRUNCATE triggers
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	resultRelInfo &lt;span style="color:#f92672"&gt;=&lt;/span&gt; resultRelInfos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;foreach&lt;/span&gt;(cell, rels)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ExecASTruncateTriggers&lt;/span&gt;(estate, resultRelInfo);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		resultRelInfo&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;ExecuteTruncateGuts&lt;/code&gt; function processes according to truncate options, with the following flow:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Find all referencing foreign key tables based on CASCADE option&lt;/li&gt;
&lt;li&gt;Fire BEFORE TRUNCATE triggers&lt;/li&gt;
&lt;li&gt;Execute truncate&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;If same transaction, don&amp;rsquo;t immediately create &lt;code&gt;NewRelfilenode&lt;/code&gt;, directly call &lt;code&gt;heap_truncate_one_rel&lt;/code&gt; for truncation&lt;/li&gt;
&lt;li&gt;If not same transaction, call &lt;code&gt;RelationSetNewRelfilenode&lt;/code&gt; to create new &lt;code&gt;NewRelfilenode&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;&lt;code&gt;reindex_relation&lt;/code&gt; function rebuilds indexes&lt;/li&gt;
&lt;li&gt;Reset sequences based on RESTART IDENTITY&lt;/li&gt;
&lt;li&gt;Write WAL log&lt;/li&gt;
&lt;li&gt;Fire AFTER TRUNCATE triggers&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Tracing further, there&amp;rsquo;s quite a bit of function nesting:
&lt;code&gt;RelationSetNewRelfilenode&lt;/code&gt;
&lt;code&gt;table_relation_set_new_filenode&lt;/code&gt;
&lt;code&gt;relation_set_new_filenode&lt;/code&gt;
&lt;code&gt;heapam_relation_set_new_filenode&lt;/code&gt;
&lt;code&gt;RelationCreateStorage&lt;/code&gt;
Then to &lt;code&gt;smgrcreate&lt;/code&gt; and &lt;code&gt;smgr_create&lt;/code&gt; in &lt;code&gt;src/backend/storage/smgr/smgr.c&lt;/code&gt;. The comment for &lt;code&gt;smgr.c&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;public interface routines to storage manager switch
All file system operations in POSTGRES dispatch through these routines.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Any file system operation goes through smgr (storage manager); at this point it becomes file system operations.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Reference
 &lt;div id="reference" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reference" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/15/sql-truncate.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/15/sql-truncate.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/mvcc-caveats.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/mvcc-caveats.html&lt;/a&gt;
&lt;a href="https://pgpedia.info/t/truncate.html" target="_blank" rel="noreferrer"&gt;https://pgpedia.info/t/truncate.html&lt;/a&gt;
&lt;a href="https://www.orafaq.com/wiki/SQL_FAQ" target="_blank" rel="noreferrer"&gt;https://www.orafaq.com/wiki/SQL_FAQ&lt;/a&gt;
&lt;a href="https://learnsql.com/blog/difference-between-truncate-delete-and-drop-table-in-sql/" target="_blank" rel="noreferrer"&gt;https://learnsql.com/blog/difference-between-truncate-delete-and-drop-table-in-sql/&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>A Classic Case of Long Transaction, Table Bloat, and LIMIT Issues</title><link>https://lastdba.com/en/2024/08/12/a-classic-case-of-long-transaction-table-bloat-and-limit-issues/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-classic-case-of-long-transaction-table-bloat-and-limit-issues/</guid><description>&lt;h1 class="relative group"&gt;Slow Primary Key Update — Problem Analysis
 &lt;div id="slow-primary-key-update--problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slow-primary-key-update--problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;A simple primary key update took over 1 second to execute. Due to high concurrency, the CPU was completely maxed out:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;084&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;lzlopr&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;158751&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.78.149:51502&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66055&lt;/span&gt;a6b.&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;c1f,&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;528&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19816630&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;970251337&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 1218.688 ms plan:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Query Text: update table_a set (omitted...）=$6 where column_id =$7
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Update on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;-&amp;gt; Index Scan using pk_id on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; Index Cond: ((column_id)::text = $7)&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL itself is very simple — an update with a condition on the primary key. Looking at the execution plan, it used the &lt;code&gt;pk_id&lt;/code&gt; primary key index, so there was no problem with the plan itself; the issue wasn&amp;rsquo;t a plan change.&lt;/p&gt;</description><content:encoded>
&lt;h1 class="relative group"&gt;Slow Primary Key Update — Problem Analysis
 &lt;div id="slow-primary-key-update--problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#slow-primary-key-update--problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;A simple primary key update took over 1 second to execute. Due to high concurrency, the CPU was completely maxed out:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;084&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;lzlopr&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;158751&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;10.33.78.149:51502&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66055&lt;/span&gt;a6b.&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;c1f,&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;528&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19816630&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;970251337&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 1218.688 ms plan:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Query Text: update table_a set (omitted...）=$6 where column_id =$7
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;Update on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;-&amp;gt; Index Scan using pk_id on table_a (cost=0.40..5.49 rows=1 width=2774)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt; Index Cond: ((column_id)::text = $7)&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL itself is very simple — an update with a condition on the primary key. Looking at the execution plan, it used the &lt;code&gt;pk_id&lt;/code&gt; primary key index, so there was no problem with the plan itself; the issue wasn&amp;rsquo;t a plan change.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s rewrite the SQL (since it&amp;rsquo;s an UPDATE) and use &lt;code&gt;explain (analyze,buffers)&lt;/code&gt; to compare the execution cost:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_a &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; column_id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;91&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1156&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;052&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;123&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;354&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((column_id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: exact&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13870&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; pk_id (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;91&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;464&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;465&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13866&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((column_id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4261&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Planning Time: &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;028&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Time: &lt;span style="color:#ae81ff"&gt;123&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;567&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The actual execution plan is fine, but &lt;code&gt;shared hit=13870&lt;/code&gt; is clearly way too high. Normally, a primary key lookup shouldn&amp;rsquo;t scan that many pages. This strongly suggests table bloat.&lt;/p&gt;
&lt;p&gt;Checking table bloat:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Table size \dt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;525&lt;/span&gt; MB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Actual row count
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;827&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Dead tuples from pg_stat_all_tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;786&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;657604&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Only ~800 live tuples but 650K dead tuples! This explains why the primary key scan visited so many pages. But why weren&amp;rsquo;t the dead tuples reclaimed?&lt;/p&gt;
&lt;p&gt;When a table exceeds the default 20% modification threshold, autovacuum triggers vacuum to reclaim space. We can see in the logs that autovacuum was indeed being triggered:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:13:46.649 CST,,,14081,,660a5099.3701,1,,2024-04-01 14:13:45 CST,259/17828993,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:13:47.801 CST,,,14081,,660a5099.3701,2,,2024-04-01 14:13:45 CST,259/17828994,971045014,LOG,00000,&amp;#34;&lt;/span&gt;automatic analyze of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt; system usage: CPU: user: 0.08 s, system: 0.01 s, elapsed: 1.15 s&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,,,,,&amp;#34;&amp;#34;,&amp;#34;&lt;/span&gt;autovacuum worker&lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:14:46.673 CST,,,26136,,660a50d5.6618,1,,2024-04-01 14:14:45 CST,259/17829090,0,LOG,00000,&amp;#34;&lt;/span&gt;automatic vacuum of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;: index scans: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:14:47.833 CST,,,26136,,660a50d5.6618,2,,2024-04-01 14:14:45 CST,259/17829091,971049759,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34; system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.15 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:15:46.680 CST,,,40743,,660a5111.9f27,1,,2024-04-01 14:15:45 CST,259/17829164,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:15:47.849 CST,,,40743,,660a5111.9f27,2,,2024-04-01 14:15:45 CST,259/17829165,971055464,LOG,00000,&amp;#34;&lt;/span&gt;automatic analyze of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt; system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.16 s&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,,,,,&amp;#34;&amp;#34;,&amp;#34;&lt;/span&gt;autovacuum worker&lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:16:46.677 CST,,,52599,,660a514d.cd77,1,,2024-04-01 14:16:45 CST,259/17829263,0,LOG,00000,&amp;#34;&lt;/span&gt;automatic vacuum of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;: index scans: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:16:47.844 CST,,,52599,,660a514d.cd77,2,,2024-04-01 14:16:45 CST,259/17829264,971061382,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34; system usage: CPU: user: 0.08 s, system: 0.03 s, elapsed: 1.16 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:17:46.699 CST,,,64858,,660a5189.fd5a,1,,2024-04-01 14:17:45 CST,234/16589539,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:17:47.851 CST,,,64858,,660a5189.fd5a,2,,2024-04-01 14:17:45 CST,234/16589540,971066091,LOG,00000,&amp;#34;&lt;/span&gt;automatic analyze of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt; system usage: CPU: user: 0.09 s, system: 0.02 s, elapsed: 1.15 s&lt;span style="color:#e6db74"&gt;&amp;#34;,,,,,,,,,&amp;#34;&amp;#34;,&amp;#34;&lt;/span&gt;autovacuum worker&lt;span style="color:#e6db74"&gt;&amp;#34;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;2024-04-01 14:18:46.703 CST,,,78112,,660a51c5.13120,1,,2024-04-01 14:18:45 CST,259/17829409,0,LOG,00000,&amp;#34;&lt;/span&gt;automatic vacuum of table &lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;lzl.public.table_a&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;: index scans: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 14:18:47.854 CST,,,78112,,660a51c5.13120,2,,2024-04-01 14:18:45 CST,259/17829410,971070390,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic analyze of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34; system usage: CPU: user: 0.09 s, system: 0.02 s, elapsed: 1.15 s&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;		&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Not only was it triggered, but the interval was exactly 1 minute. The default &lt;code&gt;autovacuum_naptime&lt;/code&gt; is 1 minute:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; autovacuum_naptime ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;autovacuum_naptime 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can conclude:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;autovacuum was successfully triggered&lt;/li&gt;
&lt;li&gt;Dead tuples either couldn&amp;rsquo;t be reclaimed fast enough — the dead tuples generated within 1 minute exceeded 20% (maybe 1 minute is too long); or they weren&amp;rsquo;t being reclaimed at all, guaranteeing the next autovacuum trigger&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&amp;rsquo;s look at the detailed autovacuum output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-01 10:22:44.648 CST,,,16827,,660a1a73.41bb,1,,2024-04-01 10:22:43 CST,170/16910186,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;automatic vacuum of table &amp;#34;&amp;#34;lzl.public.table_a&amp;#34;&amp;#34;: index scans: 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;pages: 0 removed, 48745 remain, 6 skipped due to pins, 0 skipped frozen
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;tuples: 0 removed, 744488 remain, 743666 are dead but not yet removable, oldest xmin: 969118077
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;buffer usage: 97603 hits, 0 misses, 5 dirtied
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;avg read rate: 0.000 MB/s, avg write rate: 0.028 MB/s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;system usage: CPU: user: 0.21 s, system: 0.22 s, elapsed: 1.41 s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;WAL usage: 4 records, 3 full page images, 5129 bytes&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;autovacuum worker&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;autovacuum triggered but reclaimed nothing: &lt;code&gt;tuples: 0 removed, 744488 remain, 743666 are dead but not yet removable, oldest xmin: 969118077&lt;/code&gt;. &lt;code&gt;oldest xmin&lt;/code&gt; represents the oldest transaction in the database — meaning there&amp;rsquo;s a long-running transaction. This is easy to find:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,xact_start,state_change,wait_event,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,query &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idle&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; xact_start ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+------------+-------------------------------+-------------------------------+---------------------+---------------------+------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;164658&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; phbdspsqp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;275408&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;299609&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DataFileRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;minval&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;maxval&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(ID) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; minval,&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(TRACK&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The long transaction was a SQL that had been running since around 8 AM that morning, for several hours. Even though it wasn&amp;rsquo;t on the same table, being the &lt;code&gt;oldest xmin&lt;/code&gt; it still had an impact.&lt;/p&gt;
&lt;p&gt;At this point the root cause is identified:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Table A had frequent updates, high bloat risk&lt;/li&gt;
&lt;li&gt;A long transaction on table B prevented dead tuple reclamation on table A&lt;/li&gt;
&lt;li&gt;Table A&amp;rsquo;s update statements scanned excessive pages&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Solution:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Kill the long transaction: &lt;code&gt;select pg_terminate_backend(164658)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Manually vacuum or wait 1 minute (or less) for automatic vacuum: &lt;code&gt;vacuum table_a&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After both steps were completed, checking dead tuples:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_live_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;707&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_dead_tup &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;298&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;650K dead tuples have been cleaned up.&lt;/p&gt;
&lt;p&gt;Checking the execution plan again:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_a &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; column_id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pk_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;621&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;026&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;029&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((column_id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;d4f713370e584820a9b15e2218ea436a&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;057&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Shared hits down to just 6 — issue resolved.&lt;/p&gt;
&lt;p&gt;Additionally, vacuum only reclaims dead tuples but does not shrink the table — the table remains the same size. Space can only be returned to the OS when new data reuses those pages, or through a repack/table rebuild:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;525&lt;/span&gt; MB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 class="relative group"&gt;Bonus SQL Optimization — ORDER BY LIMIT
 &lt;div id="bonus-sql-optimization--order-by-limit" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bonus-sql-optimization--order-by-limit" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h1&gt;
&lt;p&gt;That long-running transaction SQL also had its own problems&amp;hellip;
The business reported it ran fast a few days ago but took several hours today:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(ID) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; minval,&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(ID) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; maxval &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_b &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Result&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4298&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4298&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2149&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pk_b &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1181490202&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;549896&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((ID)::text &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2149&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;Backward&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; pk_b &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b table_b_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1181490202&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;549896&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((ID)::text &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL is also simple — only one condition on a time column, with decent selectivity.
However, this SQL did not use the &lt;code&gt;time_at&lt;/code&gt; index but instead used the &lt;code&gt;ID&lt;/code&gt; primary key index. This is the same &lt;a href="https://blog.csdn.net/qq_40687433/article/details/134387782?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;LIMIT problem&lt;/a&gt;. Running ANALYZE is useless here — it&amp;rsquo;s better to rewrite the SQL.&lt;/p&gt;
&lt;p&gt;After rewriting, the result came back instantly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(ID&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; minval,&lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(ID&lt;span style="color:#f92672"&gt;||&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; maxval &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; table_b &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1201418&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;86&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1201418&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;87&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_time_at &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1195919&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;549896&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (time_at &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; to_timestamp(&lt;span style="color:#e6db74"&gt;&amp;#39;2024-03-30 00:00:00&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyy-MM-dd HH24:mi:ss&amp;#39;&lt;/span&gt;::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This isn&amp;rsquo;t really an execution plan regression, because the plan didn&amp;rsquo;t change. A few days ago it had the same plan but ran fast — the reason is tied to data distribution and the LIMIT mechanism: when data is quickly found, it returns immediately (which is why the optimizer chose the primary key index); when it&amp;rsquo;s &amp;ldquo;unlucky&amp;rdquo; and the matching data is far away, it takes a very long time.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A classic case:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A small table with frequent updates&lt;/li&gt;
&lt;li&gt;A long transaction preventing dead tuple reclamation&lt;/li&gt;
&lt;li&gt;The long transaction itself was caused by an index selection problem due to sorting and LIMIT operations (ORDER BY, MAX/MIN, GROUP can all trigger this)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;One incident, three classic PostgreSQL knowledge points — quite representative.&lt;/p&gt;</content:encoded></item><item><title>A Deep Dive into PostgreSQL Transactions</title><link>https://lastdba.com/en/2024/08/12/a-deep-dive-into-postgresql-transactions/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/a-deep-dive-into-postgresql-transactions/</guid><description>&lt;p&gt;&lt;strong&gt;PostgreSQL Transactions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To guarantee ACID properties, an RDBMS must implement concurrency control. PostgreSQL, like Oracle and MySQL (InnoDB), uses MVCC (Multi-Version Concurrency Control) for concurrency control. MVCC works by continuously generating new versions of objects as data changes while allowing queries to access a bounded range of older versions. It captures a snapshot of data at a given point in time and selects one version to read.&lt;/p&gt;
&lt;p&gt;Oracle and MySQL both use undo segments to record old versions of objects. PostgreSQL has no undo. Instead, during DML operations it writes historical data directly into the original table (UPDATE creates a new row, DELETE marks the row) and records additional columns — xmin and xmax — in the table to store transaction IDs. By comparing transaction IDs and other metadata, PostgreSQL implements its MVCC mechanism.&lt;/p&gt;</description><content:encoded>&lt;p&gt;&lt;strong&gt;PostgreSQL Transactions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To guarantee ACID properties, an RDBMS must implement concurrency control. PostgreSQL, like Oracle and MySQL (InnoDB), uses MVCC (Multi-Version Concurrency Control) for concurrency control. MVCC works by continuously generating new versions of objects as data changes while allowing queries to access a bounded range of older versions. It captures a snapshot of data at a given point in time and selects one version to read.&lt;/p&gt;
&lt;p&gt;Oracle and MySQL both use undo segments to record old versions of objects. PostgreSQL has no undo. Instead, during DML operations it writes historical data directly into the original table (UPDATE creates a new row, DELETE marks the row) and records additional columns — xmin and xmax — in the table to store transaction IDs. By comparing transaction IDs and other metadata, PostgreSQL implements its MVCC mechanism.&lt;/p&gt;
&lt;p&gt;Among relational databases, PostgreSQL&amp;rsquo;s transaction mechanism is truly distinctive. Understanding it is key to grasping how PostgreSQL operates under the hood.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction Isolation Levels
 &lt;div id="transaction-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Most relational databases support multiple transaction isolation levels. Under different isolation levels, concurrent transaction behavior varies.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Setting the Transaction Isolation Level
 &lt;div id="setting-the-transaction-isolation-level" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#setting-the-transaction-isolation-level" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL supports four isolation levels (though only three are actually effective):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SERIALIZABLE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;REPEATABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMITTED&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;READ&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UNCOMMITTED&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Isolation level parameters&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;default_transaction_isolation&lt;/code&gt;: sets the default isolation level for all transactions globally.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;transaction_isolation&lt;/code&gt;: displays the isolation level of the current session.&lt;/p&gt;
&lt;p&gt;The default isolation level is &lt;code&gt;read committed&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Changing the global default isolation level&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Modify the &lt;code&gt;default_transaction_isolation&lt;/code&gt; parameter and &lt;code&gt;reload&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;postgres=# alter system set default_transaction_isolation to &amp;#39;serializable&amp;#39;;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
 (1 row)
 postgres=# show transaction_isolation;
 transaction_isolation 
-----------------------
 serializable&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After the change, every new transaction will use the &lt;code&gt;default_transaction_isolation&lt;/code&gt; isolation level.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Setting the session isolation level&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Note: &lt;code&gt;transaction_isolation&lt;/code&gt; only displays the current session&amp;rsquo;s isolation level. This parameter cannot be modified directly.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# alter system set transaction_isolation to &amp;#39;REPEATABLE READ&amp;#39;;
ERROR: parameter &amp;#34;transaction_isolation&amp;#34; cannot be changed&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Use &lt;code&gt;SET SESSION&lt;/code&gt; to change the session&amp;rsquo;s isolation level:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET
lzldb=# show transaction_isolation ;
-[ RECORD 1 ]---------+----------------
transaction_isolation | repeatable read&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Setting the transaction-level isolation level&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL allows specifying the isolation level for an individual transaction. You can set it when starting the transaction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
lzldb=# start TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Or use &lt;code&gt;set transaction&lt;/code&gt; after starting a transaction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# begin;
BEGIN
lzldb=*# set transaction ISOLATION LEVEL REPEATABLE READ;
SET&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;ANSI-92 Transaction Isolation Levels
 &lt;div id="ansi-92-transaction-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ansi-92-transaction-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;em&gt;ANSI SQL-92&lt;/em&gt; standard defines four isolation levels:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Serializable&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;All transactions in the system execute serially, without interfering with each other. Executing transactions one after another avoids all data inconsistency scenarios.&lt;/p&gt;
&lt;p&gt;Early implementations used exclusive locks to control concurrent transactions. Serial execution caused queuing and dramatically reduced system concurrency. After ANSI-92, more serializable implementation methods emerged, greatly improving both concurrency and performance.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Repeatable Read&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Once a transaction begins, all data read during the transaction cannot be modified by other transactions. Repeatable Read is MySQL&amp;rsquo;s default isolation level.&lt;/p&gt;
&lt;p&gt;Note: in ANSI SQL, Repeatable Read can experience phantom reads, but PostgreSQL&amp;rsquo;s Repeatable Read does not.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Read Committed&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A transaction can read data committed by other transactions. If a transaction reads a piece of data multiple times and that data happens to be modified and committed by another transaction in between, the current transaction will see different values for the same data. This is the default isolation level for both Oracle and PostgreSQL.&lt;/p&gt;
&lt;p&gt;At this isolation level, both &amp;ldquo;non-repeatable read&amp;rdquo; and &amp;ldquo;phantom read&amp;rdquo; scenarios can occur.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Read Uncommitted&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A transaction can read data that has been modified but not yet committed by other transactions. Since uncommitted data can still be rolled back, reading such data leads to &amp;ldquo;dirty reads.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;At this isolation level, &amp;ldquo;dirty read&amp;rdquo; scenarios can occur.&lt;/p&gt;
&lt;p&gt;PostgreSQL does not have a Read Uncommitted isolation level. Setting Read Uncommitted is treated as Read Committed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Standard concurrency phenomena and isolation level matrix&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Isolation Level&lt;/th&gt;
 &lt;th&gt;Dirty Read&lt;/th&gt;
 &lt;th&gt;Non-repeatable Read&lt;/th&gt;
 &lt;th&gt;Phantom Read&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Uncommitted&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Committed&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Repeatable Read&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Serializable&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL concurrency phenomena and isolation level matrix&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Isolation Level&lt;/th&gt;
 &lt;th&gt;Dirty Read&lt;/th&gt;
 &lt;th&gt;Non-repeatable Read&lt;/th&gt;
 &lt;th&gt;Phantom Read&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Uncommitted&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Read Committed&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;td&gt;Possible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Repeatable Read&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;Serializable&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;td&gt;Impossible&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 class="relative group"&gt;A Brief History of Transaction Isolation Levels
 &lt;div id="a-brief-history-of-transaction-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-brief-history-of-transaction-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The isolation levels and anomaly phenomena defined by ANSI SQL-92 have had a profound impact on the database industry. Even today, over 30 years later, most engineers&amp;rsquo; understanding of transaction isolation levels still revolves around them, and many real-world database isolation level implementations still follow them. However, the post-ANSI-92 era has seen much discussion and even criticism regarding isolation levels. Here is a summary of the key historical developments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;1992&lt;/strong&gt;: The database industry was in a chaotic state regarding transactions, so ANSI defined the SQL-92 standard — the widely known 4 isolation levels and 4 anomaly phenomena.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;1995&lt;/strong&gt;: Snapshot Isolation and other isolation levels were proposed, along with more anomaly phenomena. Microsoft engineers proposed the Snapshot Isolation level and criticized ANSI SQL-92, noting that the standard was vaguely defined and many isolation levels and anomalies were left undefined. See &lt;a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf" target="_blank" rel="noreferrer"&gt;&lt;em&gt;A Critique of ANSI SQL Isolation Levels&lt;/em&gt;&lt;/a&gt;. By this point, there were more than 4 isolation levels and more anomaly phenomena, including write skew.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;1999&lt;/strong&gt;: Due to the proliferation of lock-based isolation levels, &lt;a href="http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-786.pdf" target="_blank" rel="noreferrer"&gt;Atul Adya&amp;rsquo;s paper&lt;/a&gt; organized these phenomena and mapped the various isolation levels back to ANSI SQL-92 based on anomaly phenomena and functionality.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;2005&lt;/strong&gt;: Because most databases claimed to be serializable but were actually Snapshot Isolation, Alan Fekete et al proposed &lt;a href="https://pdfs.semanticscholar.org/d658/2728e30011adfe27b329c35203dfb8d1e7a8.pdf" target="_blank" rel="noreferrer"&gt;&lt;em&gt;Making Snapshot Isolation Serializable&lt;/em&gt;&lt;/a&gt; — achieving serializability on top of Snapshot Isolation by eliminating its anomalies.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;2008&lt;/strong&gt;: Fekete extended serializability and proposed a database-level implementation called &lt;a href="https://cs.nyu.edu/courses/fall09/G22.2434-001/p729-cahill.pdf" target="_blank" rel="noreferrer"&gt;Serializable Snapshot Isolation (SSI)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;2012&lt;/strong&gt;: PostgreSQL became the first database to implement SSI. See the &lt;a href="https://drkp.net/papers/ssi-vldb12.pdf" target="_blank" rel="noreferrer"&gt;PostgreSQL SSI implementation paper&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Isolation levels and anomaly phenomena from the 1995 &lt;em&gt;Critique of ANSI SQL Isolation Levels&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b45dce972611.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Isolation Levels Supported by Various Databases
 &lt;div id="isolation-levels-supported-by-various-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#isolation-levels-supported-by-various-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Many databases claim &amp;ldquo;full ACID&amp;rdquo; compliance, but without serializability, ACID cannot be fully realized (especially consistency). Yet many databases claim ACID support even without serializability. The truth is, most do not fully implement it — including the veteran Oracle.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/588a66bd74bb.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Serializable
 &lt;div id="serializable" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#serializable" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;There are many misconceptions about serializability.&lt;/p&gt;
&lt;p&gt;The meaning of serializable: if each transaction is itself correct (satisfying certain integrity conditions), then any schedule that executes those transactions serially is also correct (the transactions still satisfy their conditions). &amp;ldquo;Serial&amp;rdquo; means transactions do not overlap in time and cannot interfere with each other — they are fully isolated.&lt;/p&gt;
&lt;p&gt;In the 1970s, serializability was achieved through Strict Two-Phase Locking (SS2PL), where reads and writes block each other until the transaction ends. SS2PL sacrifices high availability but eliminates anomaly phenomena.&lt;/p&gt;
&lt;p&gt;Beyond SS2PL, there are other ways to achieve serializability, such as Serializable Snapshot Isolation (SSI).&lt;/p&gt;
&lt;p&gt;To guarantee no anomalies, serializability sacrifices some concurrency (how much depends on the implementation), but it can truly guarantee data consistency (the &amp;ldquo;C&amp;rdquo; in ACID). In other words, databases that do not implement serializability do not fully support ACID.&lt;/p&gt;
&lt;p&gt;Serializability has been mathematically proven achievable, but the real database world is somewhat &amp;ldquo;abnormal.&amp;rdquo; In practice, serializability is the highest transaction isolation level and the one strongly recommended by academics and experts. However, the vast majority of databases run at Read Committed or Snapshot Isolation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Do Weaker Isolation Levels Cause Academic Problems but Few Real-World Disasters?
 &lt;div id="why-do-weaker-isolation-levels-cause-academic-problems-but-few-real-world-disasters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-do-weaker-isolation-levels-cause-academic-problems-but-few-real-world-disasters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Anomalies in non-serializable isolation levels generally require high concurrency. Low-concurrency databases rarely encounter problems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;When anomalies do occur, some applications may not detect them or may not consider them important.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It is possible that data becomes anomalous but the application simply returns an error and enters exception-handling logic.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Cost is too high. Not only is the development cost of serializable isolation high for the database, but applications also need to adapt. Simply understanding this complex theory is no easy task.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Higher isolation levels lose some performance. Extensive rework may not be worth it; applications must choose between &amp;ldquo;high concurrency&amp;rdquo; and &amp;ldquo;freedom from anomalies.&amp;rdquo;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Business logic is built around mechanisms, not rules. Applications have somewhat adapted to the anomalies of weaker isolation levels, especially Read Committed or Snapshot Isolation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Snapshot Isolation
 &lt;div id="snapshot-isolation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-isolation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ANSI SQL-92 did not define Snapshot Isolation (SI). This isolation level emerged as the database industry evolved.&lt;/p&gt;
&lt;p&gt;Quoting the Wikipedia definition: a transaction executing under Snapshot Isolation operates on a snapshot of the database taken at the start of the transaction. When the transaction ends, it will only commit successfully if the values it updated have not been externally changed since the snapshot was taken. Write conflicts thus cause transaction aborts.&lt;/p&gt;
&lt;p&gt;As the name implies, Snapshot Isolation uses snapshots. It exists in databases that use MVCC, where the multi-version concurrency mechanism supports concurrent transaction execution.&lt;/p&gt;
&lt;p&gt;The 1992 ANSI SQL-92 standard was defined based on database locks, so it did not define Snapshot Isolation. The concept only emerged with the 1995 &lt;em&gt;Critique&lt;/em&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Serializable Snapshot Isolation
 &lt;div id="serializable-snapshot-isolation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#serializable-snapshot-isolation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Due to the widespread adoption of Snapshot Isolation and the academic goal that databases should achieve serializability, Serializable Snapshot Isolation (SSI) was born. As the name suggests, it achieves serializability on top of Snapshot Isolation.&lt;/p&gt;
&lt;p&gt;Because of the ambiguity of the ANSI-92 standard, although Snapshot Isolation was not defined, many databases actually use it. Snapshot Isolation also has certain anomaly phenomena (including write skew), and SSI was created to resolve them.&lt;/p&gt;
&lt;p&gt;Mainstream databases implement concurrency control via S2PL or MVCC. Under S2PL, write operations block reads and writes from other transactions, so there is no write skew. MVCC, however, allows reads and writes not to block each other — only write-write conflicts. In concurrent read-write patterns, this leads to write skew. Starting from PostgreSQL 9.1, SSI has been embedded into Snapshot Isolation (PostgreSQL only has Snapshot Isolation, even at the serializable level), resolving write skew and other anomalies.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Write Skew
 &lt;div id="write-skew" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#write-skew" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When certain conflicts form a cycle, serialization anomalies occur. One of the easier ones to understand is &lt;strong&gt;write skew&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Write skew only happens in read-write patterns (not write-write or write-read), and only under concurrent conditions. A dependency cycle forms when a preceding transaction&amp;rsquo;s write depends on a later transaction&amp;rsquo;s write.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2e661194aa05.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;There are many real-world cases of write skew. Let&amp;rsquo;s understand it through the classic &lt;strong&gt;black-and-white ball problem&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;A bag contains 10 balls: 5 white and 5 black. Two transactions, P and Q, are running. P changes all black balls to white; Q changes all white balls to black. There are two possible serial executions: P then Q, or Q then P. In both cases, the final result is either 10 white balls or 10 black balls. However, Snapshot Isolation allows another outcome:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transaction P picks up 5 black balls&lt;/li&gt;
&lt;li&gt;Transaction Q picks up 5 white balls&lt;/li&gt;
&lt;li&gt;Transaction P changes all the balls in hand to white and puts them back&lt;/li&gt;
&lt;li&gt;Transaction Q changes all the balls in hand to black and puts them back&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Now the bag still has 5 black and 5 white balls — an outcome impossible in any serial execution. Yet this is valid under Snapshot Isolation: each transaction maintains a consistent view of the database, and its write set does not overlap with any concurrent transaction&amp;rsquo;s write set. Hence, the black and white balls are swapped.&lt;/p&gt;
&lt;p&gt;The black-and-white ball problem illustrates: the result under Snapshot Isolation is inconsistent with the result under serial execution. Write skew occurs under Snapshot Isolation, and the data outcome does not match expectations.&lt;/p&gt;

&lt;h3 class="relative group"&gt;SSI in PostgreSQL
 &lt;div id="ssi-in-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ssi-in-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL was the first database to implement SSI. Here is the black-and-white ball example using the Wikipedia code:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; dots
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id int &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; color text &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; dots
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; x(id) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id, &lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; id &lt;span style="color:#f92672"&gt;%&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;black&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;white&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; x;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;set default_transaction_isolation = &amp;lsquo;serializable&amp;rsquo;;&lt;/th&gt;
 &lt;th style="text-align: left"&gt;set default_transaction_isolation = &amp;lsquo;serializable&amp;rsquo;;&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;begin; &lt;br /&gt;update dots set color = &amp;lsquo;black&amp;rsquo; where color = &amp;lsquo;white&amp;rsquo;;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;begin; &lt;br /&gt; update dots set color = &amp;lsquo;white&amp;rsquo; where color = &amp;lsquo;black&amp;rsquo;;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;commit&lt;/td&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;commit&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;&lt;em&gt;(PostgreSQL SSI: first committer succeeds, second throws an error)&lt;/em&gt;&lt;/td&gt;
 &lt;td style="text-align: left"&gt;ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt. HINT: The transaction might succeed if retried.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;(At Read Committed and Repeatable Read, no error is thrown; the black and white balls simply swap colors. Test results omitted.)&lt;/p&gt;
&lt;p&gt;Strict Two-Phase Locking (S2PL) can also achieve serializability, but S2PL requires heavy read-write locks held until transaction commit. S2PL severely impacts concurrency performance, and users generally won&amp;rsquo;t accept reads and writes blocking each other, so PostgreSQL does not use S2PL.&lt;/p&gt;
&lt;p&gt;SSI is an alternative approach to serializability. It still uses Snapshot Isolation but additionally checks for anomaly phenomena. The two approaches also handle anomalies differently: when one occurs, S2PL blocks transactions, while SSI aborts a transaction to break the cycle.&lt;/p&gt;
&lt;p&gt;One reason people avoid serializability is that it supposedly reduces database performance. This is understandable — SSI, which performs &amp;ldquo;anomaly checks,&amp;rdquo; must be slower than weaker isolation levels that do no such checking. However, with advances in SSI implementation theory and PostgreSQL&amp;rsquo;s optimizations for read-only transactions, SSI&amp;rsquo;s performance is now on par with SI.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7d32ee35fdba.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Serializability greatly simplifies applications&amp;rsquo; consistency concerns. PostgreSQL 9.1 has implemented SSI with optimizations. Let&amp;rsquo;s hope applications will one day truly adopt the serializable isolation level.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction Isolation Level References
 &lt;div id="transaction-isolation-level-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-isolation-level-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/SSI" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/SSI&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Serializability" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Serializability&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Snapshot_isolation" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Snapshot_isolation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://justinjaffray.com/what-does-write-skew-look-like/" target="_blank" rel="noreferrer"&gt;https://justinjaffray.com/what-does-write-skew-look-like/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.bailis.org/blog/when-is-acid-acid-rarely/" target="_blank" rel="noreferrer"&gt;http://www.bailis.org/blog/when-is-acid-acid-rarely/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf" target="_blank" rel="noreferrer"&gt;https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf&lt;/a&gt; — 1995 paper on SI isolation levels and critique of SQL-92&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/2009/Papers/p492-fekete.pdf" target="_blank" rel="noreferrer"&gt;https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/2009/Papers/p492-fekete.pdf&lt;/a&gt; — SSI paper&lt;/p&gt;
&lt;p&gt;&lt;a href="https://drkp.net/papers/ssi-vldb12.pdf" target="_blank" rel="noreferrer"&gt;https://drkp.net/papers/ssi-vldb12.pdf&lt;/a&gt; — PostgreSQL SSI implementation&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ristret.com/s/f643zk/history_transaction_histories" target="_blank" rel="noreferrer"&gt;https://ristret.com/s/f643zk/history_transaction_histories&lt;/a&gt; — History of transaction isolation levels&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction Processing
 &lt;div id="transaction-processing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-processing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Transaction Blocks
 &lt;div id="transaction-blocks" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-blocks" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transactions can be implicit or explicit. An implicit transaction is a standalone SQL statement that auto-commits upon completion. An explicit transaction requires an explicit declaration; multiple SQL statements grouped together form a transaction block.&lt;/p&gt;
&lt;p&gt;Transaction blocks begin with &lt;code&gt;begin&lt;/code&gt;, &lt;code&gt;begin transaction&lt;/code&gt;, or &lt;code&gt;start transaction&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;They end with &lt;code&gt;COMMIT&lt;/code&gt;, &lt;code&gt;END&lt;/code&gt;, or &lt;code&gt;ABORT&lt;/code&gt;, &lt;code&gt;ROLLBACK&lt;/code&gt;, where &lt;code&gt;COMMIT=END&lt;/code&gt; and &lt;code&gt;ABORT=ROLLBACK&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If an error occurs during a transaction block, the transaction can only be rolled back due to atomicity:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: relation &lt;span style="color:#e6db74"&gt;&amp;#34;lzl2&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; exist
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LINE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;^&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=!#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Transaction Processing Functions
 &lt;div id="transaction-processing-functions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-processing-functions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transaction processing functions are organized into three layers: top-level transaction functions, middle-level transaction functions, and bottom-level transaction functions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Top-level transaction functions&lt;/strong&gt; handle transaction block commands like &lt;code&gt;BEGIN&lt;/code&gt;, &lt;code&gt;COMMIT&lt;/code&gt;, &lt;code&gt;ROLLBACK&lt;/code&gt;, &lt;code&gt;SAVEPOINT&lt;/code&gt;, etc.:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;BeginTransactionBlock&lt;/th&gt;
 &lt;th&gt;Start a transaction block&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;EndTransactionBlock&lt;/td&gt;
 &lt;td&gt;End a transaction block&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UserAbortTransactionBlock&lt;/td&gt;
 &lt;td&gt;User-initiated transaction abort&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;DefineSavepoint&lt;/td&gt;
 &lt;td&gt;Create a savepoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;RollbackToSavepoint&lt;/td&gt;
 &lt;td&gt;Roll back to a savepoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;ReleaseSavepoint&lt;/td&gt;
 &lt;td&gt;Release a savepoint&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Middle-level transaction functions&lt;/strong&gt;: every SQL statement calls middle-level functions before and after execution, including after detecting an exception:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;StartTransactionCommand&lt;/th&gt;
 &lt;th&gt;Start a transaction command&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;CommitTransactionCommand&lt;/td&gt;
 &lt;td&gt;Complete a transaction command (not commit)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AbortCurrentTransaction&lt;/td&gt;
 &lt;td&gt;Abort the current transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Bottom-level transaction functions&lt;/strong&gt;: the actual transaction processing functions, responsible for maintaining transaction state, allocating and reclaiming transaction resources, etc.:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;StartTransaction&lt;/th&gt;
 &lt;th&gt;Start a transaction&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;CommitTransaction&lt;/td&gt;
 &lt;td&gt;Commit a transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AbortTransaction&lt;/td&gt;
 &lt;td&gt;Rollback/abort a transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CleanupTransaction&lt;/td&gt;
 &lt;td&gt;Clean up a transaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;StartSubTransaction&lt;/td&gt;
 &lt;td&gt;Start a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CommitSubTransaction&lt;/td&gt;
 &lt;td&gt;Commit a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;AbortSubTransaction&lt;/td&gt;
 &lt;td&gt;Rollback/abort a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;CleanupSubTransaction&lt;/td&gt;
 &lt;td&gt;Clean up a subtransaction&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These functions are fairly easy to distinguish. Aside from a few special functions (top-level &lt;code&gt;savepoint&lt;/code&gt;-related, middle-level &lt;code&gt;abort&lt;/code&gt; function), the three layers are organized as: *Block (transaction block functions), *Command (command functions), and *Transaction (actual transaction processing functions). Savepoints/subtransactions are treated as transaction-block-level functions (subtransactions can be rolled back within a transaction block, so placing them at the block level makes sense), and abort is treated as a command-level function.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction Block States
 &lt;div id="transaction-block-states" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-block-states" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Top-level and middle-level functions jointly control the transaction block state; bottom-level functions control the transaction state.&lt;/p&gt;
&lt;p&gt;Both transaction block states and transaction states are in &lt;code&gt;src/backend/access/transam/xact.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt; TBlockState
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* states not in a transaction block */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_DEFAULT, &lt;span style="color:#75715e"&gt;/* idle state; entering or exiting a transaction returns to this state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_STARTED, &lt;span style="color:#75715e"&gt;/* just entered a transaction block; transitions from TBLOCK_DEFAULT; short-lived */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* transaction block states */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_BEGIN, &lt;span style="color:#75715e"&gt;/* start a transaction block; at this point data block is started, entering block-level state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction; after BEGIN, the block stays in this state until transaction ends */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_IMPLICIT_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction with an implicit BEGIN */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_PARALLEL_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction in parallel execution */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_END, &lt;span style="color:#75715e"&gt;/* received COMMIT command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_ABORT, &lt;span style="color:#75715e"&gt;/* transaction failed, waiting for ROLLBACK */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_ABORT_END, &lt;span style="color:#75715e"&gt;/* transaction failed, received ROLLBACK */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_ABORT_PENDING, &lt;span style="color:#75715e"&gt;/* active transaction, received ROLLBACK */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_PREPARE, &lt;span style="color:#75715e"&gt;/* active transaction, received PREPARE (explicit 2PC) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* subtransaction states (still transaction-block level) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBBEGIN, &lt;span style="color:#75715e"&gt;/* start a subtransaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBINPROGRESS, &lt;span style="color:#75715e"&gt;/* active subtransaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBRELEASE, &lt;span style="color:#75715e"&gt;/* received RELEASE (release savepoint) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBCOMMIT, &lt;span style="color:#75715e"&gt;/* parent transaction COMMIT while subtransaction is still running (SUBINPROGRESS) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT, &lt;span style="color:#75715e"&gt;/* failed subtransaction, waiting for rollback command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT_END, &lt;span style="color:#75715e"&gt;/* failed subtransaction, received rollback command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT_PENDING, &lt;span style="color:#75715e"&gt;/* active subtransaction, received rollback command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBRESTART, &lt;span style="color:#75715e"&gt;/* active subtransaction, received rollback to command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TBLOCK_SUBABORT_RESTART &lt;span style="color:#75715e"&gt;/* failed subtransaction, received ROLLBACK TO command */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} TBlockState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Most states are self-explanatory. A note on rollback vs. abort: their subsequent behavior is similar — both need to clean up transaction resources and exit the current transaction. Yet PostgreSQL separates them into two behaviors with two states: &lt;code&gt;TBLOCK_ABORT&lt;/code&gt; and &lt;code&gt;TBLOCK_ABORT_END&lt;/code&gt; (and similarly for subtransactions). Why?&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/backend/access/transam/README&lt;/code&gt; offers a detailed explanation:&lt;/p&gt;
&lt;blockquote&gt;&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Scenario 1&lt;/th&gt;
 &lt;th&gt;Scenario 2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1) User types &lt;code&gt;BEGIN&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;1) User types &lt;code&gt;BEGIN&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2) User executes some commands&lt;/td&gt;
 &lt;td&gt;2) User executes some commands&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3) User doesn&amp;rsquo;t like what she sees, types &lt;code&gt;ABORT&lt;/code&gt;&lt;/td&gt;
 &lt;td&gt;3) The transaction system aborts for some reason (syntax error, etc.)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In Scenario 1, we want to abort the transaction and return to the default state.&lt;/p&gt;
&lt;p&gt;In Scenario 2, more commands may follow that are still part of the current transaction block. We must ignore these commands until we see &lt;code&gt;COMMIT&lt;/code&gt; or &lt;code&gt;ROLLBACK&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;AbortCurrentTransaction&lt;/code&gt; handles internal transaction aborts; &lt;code&gt;UserAbortTransactionBlock&lt;/code&gt; handles user-initiated aborts. Both rely on &lt;code&gt;AbortTransaction&lt;/code&gt; to do all the real work. The only difference is what state we enter after &lt;code&gt;AbortTransaction&lt;/code&gt; finishes:&lt;/p&gt;
&lt;p&gt;* AbortCurrentTransaction leaves us in TBLOCK_ABORT&lt;/p&gt;
&lt;p&gt;* UserAbortTransactionBlock leaves us in TBLOCK_ABORT_END&lt;/p&gt;
&lt;p&gt;Bottom-level transaction abort processing has two phases:&lt;/p&gt;
&lt;p&gt;* As soon as we realize the transaction has failed, &lt;code&gt;AbortTransaction&lt;/code&gt; is executed. This should release all shared resources (locks, etc.) to avoid unnecessarily increasing latency for other backends.&lt;/p&gt;
&lt;p&gt;* When we finally see the user&amp;rsquo;s &lt;code&gt;COMMIT&lt;/code&gt; or &lt;code&gt;ROLLBACK&lt;/code&gt;, &lt;code&gt;CleanupTransaction&lt;/code&gt; is executed; this function cleans up resources and gets us completely out of the transaction. In particular, we cannot destroy &lt;code&gt;TopTransactionContext&lt;/code&gt; before this point.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3 class="relative group"&gt;Transaction States
 &lt;div id="transaction-states" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-states" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transaction states are straightforward (note: these are different from transaction block states):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt; TransState
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_DEFAULT, &lt;span style="color:#75715e"&gt;/* idle */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_START, &lt;span style="color:#75715e"&gt;/* transaction started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_INPROGRESS, &lt;span style="color:#75715e"&gt;/* active transaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_COMMIT, &lt;span style="color:#75715e"&gt;/* transaction commit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_ABORT, &lt;span style="color:#75715e"&gt;/* abort transaction */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TRANS_PREPARE &lt;span style="color:#75715e"&gt;/* prepare transaction (2PC) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} TransState;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Transaction State Flow
 &lt;div id="transaction-state-flow" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-state-flow" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Each command in a transaction block calls transaction functions, which in turn transition the transaction and transaction block states.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s use the simplest transaction block as an example (from the README):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; foo
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; foo &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (...)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Command call relationships:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; / StartTransactionCommand; -- middle-level: start transaction command
 / StartTransaction; -- bottom-level: actually start the transaction
 1)&amp;lt; ProcessUtility; -- ProcessUtility handles the BEGIN command
 \ BeginTransactionBlock; -- top-level: start transaction block
 \ CommitTransactionCommand; -- middle-level: complete command

 / StartTransactionCommand; -- middle-level: start transaction command
2) / PortalRunSelect; -- execute SELECT statement
 \ CommitTransactionCommand; -- middle-level: complete command
 \ CommandCounterIncrement; -- middle-level: command counter increment

 / StartTransactionCommand; -- middle-level: start transaction command
3) / ProcessQuery; -- execute INSERT statement
 \ CommitTransactionCommand; -- middle-level: complete command
 \ CommandCounterIncrement; -- command counter +1

 / StartTransactionCommand; -- middle-level: start transaction command
 / ProcessUtility; -- ProcessUtility handles COMMIT command
4) &amp;lt; EndTransactionBlock; -- top-level: end transaction block
 \ CommitTransactionCommand; -- middle-level: complete command
 \ CommitTransaction; -- bottom-level: actually commit the transaction
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;Every command in a transaction block begins with the middle-level &lt;code&gt;StartTransactionCommand&lt;/code&gt; and ends with &lt;code&gt;CommitTransactionCommand&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Between these two middle-level functions is where the actual command processing occurs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The transaction block state for 2) SELECT and 3) INSERT is &lt;code&gt;TBLOCK_INPROGRESS&lt;/code&gt;. The state transitions for &lt;code&gt;BEGIN&lt;/code&gt; and &lt;code&gt;COMMIT&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b8f307da3f3f.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction Function References
 &lt;div id="transaction-function-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-function-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;PostgreSQL Internals&lt;/em&gt; (book)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/backend/access/transam/README&lt;/code&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction ID
 &lt;div id="transaction-id" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Every transaction in PostgreSQL is assigned a transaction ID. Transaction IDs come in two forms: virtual transaction IDs and persistent transaction IDs. Understanding transaction IDs is crucial for grasping transactions, data visibility, transaction ID wraparound, and more.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Virtual Transaction ID
 &lt;div id="virtual-transaction-id" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#virtual-transaction-id" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Read-only transactions are not assigned a transaction ID — transaction IDs are a precious resource. A simple SELECT, for instance, won&amp;rsquo;t consume one. However, to identify transactions for purposes such as shared locks, a non-persistent transaction ID is needed. This is the virtual transaction ID (VXID).&lt;/p&gt;
&lt;p&gt;VXID consists of two parts: a backend ID and a backend-local counter.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/lock.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;BackendId backendId; &lt;span style="color:#75715e"&gt;/* backendId from PGPROC */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LocalTransactionId localTransactionId; &lt;span style="color:#75715e"&gt;/* lxid from PGPROC */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} VirtualTransactionId;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(PGPROC is a structure storing process information; we&amp;rsquo;ll cover it later.)&lt;/p&gt;
&lt;p&gt;You can see VXID in &lt;code&gt;pg_locks&lt;/code&gt;. Querying &lt;code&gt;pg_locks&lt;/code&gt; itself is a SQL statement, so it generates a VXID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation		&lt;span style="color:#f92672"&gt;|&lt;/span&gt; 		 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; 	 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; 		 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation 	&lt;span style="color:#f92672"&gt;|&lt;/span&gt; 	 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After &lt;code&gt;\q&lt;/code&gt; (disconnect) and immediately logging back in, the counter continues: &lt;code&gt;4/19&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Opening another window gives &lt;code&gt;backendID+1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,virtualxid,virtualtransaction,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+--------------------+-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From these tests we can observe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The VXID&amp;rsquo;s backend ID is not the actual process PID; it&amp;rsquo;s simply an incrementing number.&lt;/li&gt;
&lt;li&gt;Both the VXID&amp;rsquo;s backend ID and command counter are incrementing.&lt;/li&gt;
&lt;li&gt;Subtransactions do not have their own VXID; they use the parent transaction&amp;rsquo;s VXID.&lt;/li&gt;
&lt;li&gt;VXID also has wraparound, but it&amp;rsquo;s not a serious issue since it isn&amp;rsquo;t persisted — after an instance restart, VXID starts counting from scratch.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Persistent Transaction ID
 &lt;div id="persistent-transaction-id" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#persistent-transaction-id" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;32-bit TransactionId
 &lt;div id="32-bit-transactionid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#32-bit-transactionid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When a data-modifying transaction begins, the transaction manager assigns it a unique identifier: &lt;code&gt;TransactionId&lt;/code&gt;. &lt;code&gt;TransactionId&lt;/code&gt; is a 32-bit unsigned integer, capable of storing &lt;code&gt;2^32 = 4,294,967,296&lt;/code&gt; — about 4.2 billion — transactions. The range of a 32-bit unsigned integer is &lt;code&gt;0 ~ 2^32 - 1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Three special transaction IDs&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/include/access/transam.h&lt;/code&gt; defines several special transaction IDs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define InvalidTransactionId ((TransactionId) 0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define BootstrapTransactionId ((TransactionId) 1)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FrozenTransactionId ((TransactionId) 2)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FirstNormalTransactionId ((TransactionId) 3)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MaxTransactionId ((TransactionId) 0xFFFFFFFF)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;0: Invalid TransactionId&lt;/li&gt;
&lt;li&gt;1: Bootstrap Transaction ID, used only during database initialization. Older than all normal transactions.&lt;/li&gt;
&lt;li&gt;2: Frozen Transaction ID. Older than all normal transactions.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdIsNormal(xid) ((xid) &amp;gt;= FirstNormalTransactionId)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A transaction ID &amp;gt;= 3 is a normal transaction ID.&lt;/p&gt;
&lt;p&gt;The maximum transaction ID, &lt;code&gt;MaxTransactionId&lt;/code&gt;, is &lt;code&gt;0xFFFFFFFF = 4,294,967,295 = 2^32 - 1&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So the allocatable range for normal transaction IDs is: &lt;code&gt;3 ~ 2^32 - 1&lt;/code&gt;.&lt;/p&gt;

&lt;h4 class="relative group"&gt;64-bit FullTransactionId
 &lt;div id="64-bit-fulltransactionid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#64-bit-fulltransactionid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Transaction IDs increment sequentially. PostgreSQL has used 32-bit transaction IDs for a long time. Before PostgreSQL 7.2, when the 32-bit transaction ID was exhausted, you had to dump and restore the database. A 64-bit transaction ID, on the other hand, is practically inexhaustible. The source defines a 64-bit &lt;code&gt;FullTransactionId&lt;/code&gt; as a struct:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *A 64-bit value containing an epoch and a TransactionId.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *It is wrapped in a struct to prevent implicit conversion to TransactionId.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *Not all values represent valid normal XIDs.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; FullTransactionId
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uint64 value;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} FullTransactionId;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The 64-bit value consists of an &lt;code&gt;epoch&lt;/code&gt; and a 32-bit &lt;code&gt;TransactionId&lt;/code&gt;, converted via these functions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define EpochFromFullTransactionId(x)	((uint32) ((x).value &amp;gt;&amp;gt; 32))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XidFromFullTransactionId(x)		((uint32) (x).value)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The epoch is &lt;code&gt;FullTransactionId&lt;/code&gt; shifted right 32 bits; the XID (&lt;code&gt;TransactionId&lt;/code&gt;) is &lt;code&gt;FullTransactionId&lt;/code&gt; modulo &lt;code&gt;2^32&lt;/code&gt;. This is like treating the 32-bit &lt;code&gt;TransactionId&lt;/code&gt; as a &amp;ldquo;circle&amp;rdquo; that loops, while the 64-bit &lt;code&gt;FullTransactionId&lt;/code&gt; is a &amp;ldquo;line&amp;rdquo; that keeps growing, nearly inexhaustible.&lt;/p&gt;
&lt;p&gt;A full transaction ID can exceed &lt;code&gt;2^32&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e91011271323.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID Assignment
 &lt;div id="transaction-id-assignment" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-assignment" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s run a few experiments to see how transaction IDs are assigned. We&amp;rsquo;ll use two functions that return transaction IDs:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_current_xact_id()&lt;/code&gt;: returns the current transaction ID; if the current transaction has not yet been assigned one, it allocates one. (In pg12 and earlier, use &lt;code&gt;txid_current()&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;code&gt;pg_current_xact_id_if_assigned()&lt;/code&gt;: returns the current transaction ID; if the current transaction has not yet been assigned one, returns NULL. (In pg12 and earlier, use &lt;code&gt;txid_current_if_assigned()&lt;/code&gt;.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction IDs are assigned sequentially:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# select pg_current_xact_id();
 pg_current_xact_id 
--------------------
 612
lzldb=# select pg_current_xact_id();
 pg_current_xact_id 
--------------------
 613
lzldb=# select pg_current_xact_id();
 pg_current_xact_id 
--------------------
 614&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;BEGIN does not immediately allocate a transaction ID:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;lzldb=# begin; -- explicitly start a transaction
BEGIN
lzldb=*# select pg_current_xact_id_if_assigned () ; -- BEGIN does not immediately allocate a transaction ID
 pg_current_xact_id_if_assigned 
-------------------------------- 
(1 row)
lzldb=*# select * from lzl1; -- query immediately after BEGIN
 a 
---
(0 rows)
lzldb=*# select pg_current_xact_id_if_assigned () ; -- queries do not allocate transaction IDs
 pg_current_xact_id_if_assigned 
-------------------------------- 
(1 row)
lzldb=*# insert into lzl1 values(1); -- insert data, a data change
INSERT 0 1
lzldb=*# select pg_current_xact_id_if_assigned () ; -- the first non-query statement after BEGIN allocates a transaction ID
 pg_current_xact_id_if_assigned 
--------------------------------
 611
lzldb=*# commit;
COMMIT
lzldb=# select xmin, pg_current_xact_id_if_assigned () from lzl1; -- the INSERT transaction writes to xmin
 xmin | pg_current_xact_id_if_assigned 
------+--------------------------------
 611 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Some records in system catalogs were assigned &lt;code&gt;BootstrapTransactionId=1&lt;/code&gt; during database initialization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;pre tabindex="0"&gt;&lt;code class="language-sqlite" data-lang="sqlite"&gt;postgres=# select xmin,count(*) from pg_class where xmin=1 group by xmin;
 xmin | count 
------+-------
 1 | 184&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Conclusions from the experiments:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;During database initialization, the special transaction ID 1 is assigned, visible in system catalogs.&lt;/li&gt;
&lt;li&gt;Transaction IDs are assigned incrementally.&lt;/li&gt;
&lt;li&gt;BEGIN does not immediately allocate a transaction ID; the first non-query statement after BEGIN allocates one.&lt;/li&gt;
&lt;li&gt;When a transaction inserts a tuple, the transaction&amp;rsquo;s txid is written into the tuple&amp;rsquo;s xmin.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Transaction ID Comparison
 &lt;div id="transaction-id-comparison" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-comparison" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL compares the age of transactions by their transaction IDs. &lt;code&gt;src/backend/access/transam/transam.c&lt;/code&gt; defines four comparison functions: &lt;code&gt;&amp;lt;&lt;/code&gt;, &lt;code&gt;&amp;lt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;gt;=&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdPrecedes&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdPrecedesOrEquals&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdFollows&lt;/span&gt;()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;TransactionIdFollowsOrEquals&lt;/span&gt;()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;They are similar. Let&amp;rsquo;s examine &lt;code&gt;TransactionIdPrecedes()&lt;/code&gt; as the representative:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdPrecedes&lt;/span&gt;(TransactionId id1, TransactionId id2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * If either ID is a permanent XID then we can just do unsigned
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * comparison. If both are normal, do a modulo-2^32 comparison.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;int32 diff;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id1) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (id1 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; id2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;diff &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (int32) (id1 &lt;span style="color:#f92672"&gt;-&lt;/span&gt; id2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (diff &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Key points from this source code:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;TransactionIdIsNormal()&lt;/code&gt; is a macro defined in the header to check for normal transactions. &lt;code&gt;FirstNormalTransactionId&lt;/code&gt; is the constant 3. So a normal transaction ID is &amp;gt;= 3.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdIsNormal(xid) ((xid) &amp;gt;= FirstNormalTransactionId)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;int32&lt;/code&gt; is a signed integer: the first bit being 0 means positive, 1 means negative. Range: &lt;code&gt;-2^31 ~ 2^31 - 1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Integer overflow: when a value exceeds the storage range (e.g., &lt;code&gt;2^31&lt;/code&gt; barely overflows for int32), the value wraps around.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The transaction ID comparison code can be understood in two parts:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Non-normal transaction ID comparison:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id1) &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(id2))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (id1 &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; id2);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;id1=2&lt;/code&gt;, &lt;code&gt;id2=100&lt;/code&gt;: &lt;code&gt;return(2&amp;lt;100)&lt;/code&gt;, precedes is true — the normal transaction is newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1=100&lt;/code&gt;, &lt;code&gt;id2=2&lt;/code&gt;: &lt;code&gt;return(100&amp;lt;2)&lt;/code&gt;, precedes is false — the normal transaction is newer.&lt;/p&gt;
&lt;p&gt;So, txid 1 and 2 are older than normal transactions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Normal transaction ID comparison:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;diff &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (int32) (id1 &lt;span style="color:#f92672"&gt;-&lt;/span&gt; id2);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; (diff &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;id1 - id2&lt;/code&gt; can be negative, so &lt;code&gt;diff&lt;/code&gt; cannot be unsigned int. It must be cast to signed int. Now the crucial part:&lt;/p&gt;
&lt;p&gt;Since int32 ranges from &lt;code&gt;-2^31&lt;/code&gt; to &lt;code&gt;2^31 - 1&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 2^31 + 99&lt;/code&gt;, &lt;code&gt;id2 = 100&lt;/code&gt;: &lt;code&gt;id1 - id2 = 2^31 - 1&lt;/code&gt;. Fine — int32 can hold this. → Larger txid is newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 2^31 + 100&lt;/code&gt;, &lt;code&gt;id2 = 100&lt;/code&gt;: &lt;code&gt;id1 - id2 = 2^31&lt;/code&gt;. Problem — exactly exceeds int32 storage. The value becomes &lt;code&gt;2^31 - 2^32 = -2^31 &amp;lt; 0&lt;/code&gt;. → Smaller txid is considered newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 100&lt;/code&gt;, &lt;code&gt;id2 = 2^31 + 100&lt;/code&gt;: &lt;code&gt;id1 - id2 = -2^31&lt;/code&gt;. Fine — int32 can hold this. → Larger txid is newer.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;id1 = 100&lt;/code&gt;, &lt;code&gt;id2 = 2^31 + 101&lt;/code&gt;: &lt;code&gt;id1 - id2 = -2^31 - 1&lt;/code&gt;. Problem — exactly exceeds int32 storage. The value becomes &lt;code&gt;-2^31 - 1 + 2^32 = 2^31 - 1 &amp;gt; 0&lt;/code&gt;. → Smaller txid is considered newer.&lt;/p&gt;
&lt;p&gt;From this analysis, when integer overflow occurs, a transaction with a larger txid cannot see a transaction with a smaller txid. The overflow itself is an exceptional event, so this is acceptable. To address this, PostgreSQL divides the 4-billion transaction ID space into two halves: one half is visible, the other invisible.&lt;/p&gt;
&lt;p&gt;For example, for transaction txid 100, the 2 billion transactions in its past are visible, and the 2 billion transactions in its future are invisible. Therefore, the maximum difference between the oldest and newest transaction IDs (the database age) in PostgreSQL is &lt;code&gt;|-2^31| = 2^31&lt;/code&gt;, roughly 2 billion.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b39c0f44d535.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID Wraparound
 &lt;div id="transaction-id-wraparound" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-wraparound" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What is transaction ID wraparound?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Understanding transaction ID wraparound itself is not difficult, but when I first studied it, I found two different definitions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL official definition:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Because transaction IDs are limited in size (32 bits), a cluster that runs for a long time (more than 4 billion transactions) will suffer transaction ID wraparound: the XID counter wraps around to zero, and suddenly past transactions appear to be in the future — meaning they become invisible. In short, catastrophic data loss. (The data is still there, but you can&amp;rsquo;t access it.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;interdb explanation:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A tuple&amp;rsquo;s t_xmin records the minimum transaction of that tuple. If the tuple never changes, this t_xmin stays the same. Suppose tuple_1 was created by transaction txid=100, so its t_xmin=100. If the database advances by &lt;code&gt;2^31&lt;/code&gt; transactions, reaching &lt;code&gt;2^31+100&lt;/code&gt;, tuple_1 is still visible. Then another transaction starts, advancing txid to &lt;code&gt;2^31+101&lt;/code&gt;. Now txid=100 is in the &amp;ldquo;future,&amp;rdquo; so tuple_1 becomes invisible. This is severe data loss — this is transaction ID wraparound.&lt;/p&gt;
&lt;p&gt;Yes, the official documentation and some classic articles define transaction ID wraparound differently. They are indeed describing two different things. I attribute this to a &lt;strong&gt;translation issue&lt;/strong&gt;: both behaviors are &lt;strong&gt;wraparound&lt;/strong&gt; in English semantics. If you reconsider the meaning of &amp;ldquo;wraparound,&amp;rdquo; they are both forms of it.&lt;/p&gt;
&lt;p&gt;However, they differ: one is when transaction IDs (&lt;code&gt;2^32&lt;/code&gt;) are fully exhausted and wrap back to 0; the other is when the &amp;ldquo;oldest transaction ID&amp;rdquo; and &amp;ldquo;newest transaction ID&amp;rdquo; differ by more than &lt;code&gt;2^31&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The official definition of transaction ID wraparound introduces the concept that &amp;ldquo;transaction IDs form a circle.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;The generally understood transaction ID wraparound problem is the &amp;ldquo;circle divided into two halves, one visible, one invisible&amp;rdquo; concept — when the &amp;ldquo;more than half&amp;rdquo; threshold is crossed, that&amp;rsquo;s wraparound.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In practice, the wraparound problem you actually need to worry about is the latter: the difference between the newest and oldest transaction IDs must not exceed 2.1 billion (&lt;code&gt;2^31&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How long does 2.1 billion transactions take?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;2.1 billion transactions sounds like a lot, but it can still be exhausted.&lt;/p&gt;
&lt;p&gt;For example, a PostgreSQL database with 100 TPS (not counting SELECT statements, since simple SELECTs don&amp;rsquo;t allocate transaction IDs) uses 8,640,000 transactions per day. It takes only about 2,147,483,648 / 8,640,000 ≈ 248 days to exhaust 2.1 billion transaction IDs and trigger wraparound. At 1,000 transactions per second, it takes less than one month. So transaction ID wraparound is something you must pay attention to in PostgreSQL.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID Freezing
 &lt;div id="transaction-id-freezing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-freezing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To solve the serious data loss problem caused by transaction ID wraparound, PostgreSQL introduced the concept of transaction freezing.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/203cfe4768b1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;XIDs are reused cyclically and divided into two halves: one visible, one invisible. For a tuple with xid=100, if no operations are performed and transaction IDs keep advancing, the once-visible tuple will eventually become invisible.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7512304ffdf5.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;As mentioned earlier, there is a frozen transaction ID. If the tuple with xid=100 is marked with the frozen transaction ID, it will remain visible. This is the purpose of transaction freezing.&lt;/p&gt;
&lt;p&gt;The frozen transaction ID &lt;code&gt;FrozenTransactionId = 2&lt;/code&gt;, and it is older than all normal transactions. That means txid=2 is visible to all normal transactions (txid &amp;gt;= 3). When t_xmin is older than &lt;code&gt;current_txid - vacuum_freeze_min_age&lt;/code&gt; (default 50 million), the tuple is rewritten with the frozen transaction ID 2. In version 9.4 and later, the &lt;code&gt;xmin_frozen&lt;/code&gt; flag in t_infomask is used to indicate a frozen tuple, rather than rewriting t_xmin to 2.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/352182ad7218.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;There are many optimization approaches to the transaction ID wraparound problem, but none can avoid transaction freezing. Freezing involves reading every row of every table and resetting flags — a massive I/O and CPU operation. There&amp;rsquo;s no escaping it; the database may even reject all operations until freezing completes. This is known as the &amp;ldquo;freeze bomb.&amp;rdquo; The busier the system and the higher the transaction rate, the more likely it is to trigger. (We&amp;rsquo;ll expand on freeze optimization in a future chapter.)&lt;/p&gt;

&lt;h3 class="relative group"&gt;64-bit Transaction IDs
 &lt;div id="64-bit-transaction-ids" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#64-bit-transaction-ids" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;strong&gt;ultimate solution&lt;/strong&gt; to transaction ID exhaustion and wraparound is using 64-bit transaction IDs. A 32-bit txid provides &lt;code&gt;2^32&lt;/code&gt; IDs; a 64-bit txid provides &lt;code&gt;2^64&lt;/code&gt;. Even at 10,000 transactions per second — 864 million per day — it would take 58.49 million years to exhaust them. With 64-bit transaction IDs, they are practically inexhaustible. No wraparound, no freezing, no &amp;ldquo;freeze bomb&amp;rdquo;&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why hasn&amp;rsquo;t 64-bit transaction ID been implemented yet?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Note: 64-bit transaction IDs already exist in PostgreSQL (as &lt;code&gt;FullTransactionId&lt;/code&gt; described earlier). However, because tuple storage is limited, the xmin, xmax, etc. in tuples still use 32-bit XIDs, and transaction ID comparison still relies on 32-bit XIDs. xmin and xmax — the transaction IDs for insert and delete — are stored in each tuple&amp;rsquo;s header (we&amp;rsquo;ll cover tuple structure later), and header space is limited. A 32-bit txid is 4 bytes; a 64-bit txid is 8 bytes. Storing both xmin and xmax as 64-bit would require an extra 8 bytes, which the current header cannot accommodate. The community has discussed two approaches:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Extend the header to store 64-bit transaction IDs directly.&lt;/li&gt;
&lt;li&gt;Keep the header size unchanged. Retain 64-bit transaction IDs in memory, adding an epoch concept to convert between the two.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The first approach has been essentially abandoned — compared to other systems, PostgreSQL&amp;rsquo;s tuple header is already large enough.&lt;/p&gt;
&lt;p&gt;The second approach already has epochs and FullTransactionId-to-TransactionId conversion. The key is how to convert the TransactionId in tuples to FullTransactionId (though some extra storage for the epoch would still be needed — otherwise, how to implement it?).&lt;/p&gt;
&lt;p&gt;See community mailing list discussions:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/CAEYLb_UfC&amp;#43;HZ4RAP7XuoFZr&amp;#43;2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/CAEYLb_UfC+HZ4RAP7XuoFZr+2_ktQmS9xqcQgE-rNf5UCqEt5A@mail.gmail.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD@gmail.com" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/flat/DA1E65A4-7C5A-461D-B211-2AD5F9A6F2FD%40gmail.com&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The community proposed 64-bit transaction IDs as a permanent solution to the freeze problem back in 2014, and began discussing practical implementation in 2017. But after several PostgreSQL versions, it&amp;rsquo;s still vaporware. Given the sensitivity and importance of data in databases, and how many things transaction ID changes touch — one slip could mean data loss or unknown bugs — PostgreSQL is moving cautiously. However, the community is still considering it. Hopefully one day, in some PostgreSQL version, the transaction ID wraparound problem will be completely solved.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Transaction ID References
 &lt;div id="transaction-id-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-id-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.slideshare.net/masahikosawada98/introduction-vauum-freezing-xid-wraparound?from_action=save" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/masahikosawada98/introduction-vauum-freezing-xid-wraparound?from_action=save&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/427012" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/427012&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/377530" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/377530&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/routine-vacuuming.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/routine-vacuuming.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/weixin_30916255/article/details/112365965" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/weixin_30916255/article/details/112365965&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/FullTransactionId" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/FullTransactionId&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.bookstack.cn/read/aliyun-rds-core/bd7e1c1955b35f7d.md" target="_blank" rel="noreferrer"&gt;https://www.bookstack.cn/read/aliyun-rds-core/bd7e1c1955b35f7d.md&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/digoal/blog/blob/master/201605/20160520_01.md" target="_blank" rel="noreferrer"&gt;https://github.com/digoal/blog/blob/master/201605/20160520_01.md&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction-Related Tuple Structure
 &lt;div id="transaction-related-tuple-structure" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-related-tuple-structure" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The tuple structure contains much of the information essential to PostgreSQL&amp;rsquo;s MVCC. The following sections cover xmin, xmax, t_ctid, cmin, cmax, combo CID, and tuple ID — their meanings and relationships.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Physical Structure
 &lt;div id="physical-structure" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#physical-structure" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2d7dd2db28e1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;HeapTupleHeaderData&lt;/code&gt; is the tuple header. Its structure is defined in &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; HeapTupleFields
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId t_xmin;		&lt;span style="color:#75715e"&gt;/* transaction ID of inserter */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId t_xmax;		&lt;span style="color:#75715e"&gt;/* transaction ID of deleter or locker */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		CommandId	t_cid;		&lt;span style="color:#75715e"&gt;/* command ID of insert or delete */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		TransactionId t_xvac;	&lt;span style="color:#75715e"&gt;/* VACUUM FULL transaction ID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}			t_field3;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} HeapTupleFields;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; DatumTupleFields
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} DatumTupleFields;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; HeapTupleHeaderData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;union&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		HeapTupleFields t_heap;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		DatumTupleFields t_datum;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}			t_choice;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ItemPointerData t_ctid;		&lt;span style="color:#75715e"&gt;/* TID of current tuple or updated tuple */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Five definitions in &lt;code&gt;HeapTupleHeaderData&lt;/code&gt; are critically important to MVCC. (Here, &amp;ldquo;x&amp;rdquo; = transaction, &amp;ldquo;c&amp;rdquo; = command, &amp;ldquo;t&amp;rdquo; = tuple — helpful for categorization.)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;t_xmin&lt;/code&gt;: the transaction ID that inserted this tuple.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_xmax&lt;/code&gt;: the transaction ID that deleted this tuple, or the transaction ID that rolled back. If the tuple has not been deleted or updated, xmax is 0. If the delete or update was rolled back, xmax is the rolling-back transaction&amp;rsquo;s ID.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_xvac&lt;/code&gt;: the transaction ID set when the tuple is vacuumed. At that point, the tuple is detached from its original transaction.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_cid&lt;/code&gt;: the command ID (cid). A transaction can contain multiple SQL statements. Commands within a transaction are numbered starting from 0, incrementing sequentially. CommandId is a uint32 type, supporting up to &lt;code&gt;2^32 - 1&lt;/code&gt; commands. To conserve resources, and because queries don&amp;rsquo;t affect row transaction ordering, queries do not increment cid (similar to how transaction IDs are allocated).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;t_ctid&lt;/code&gt;: stores a pointer to itself or to a newer tuple. TID identifies a tuple within a table — it is the tuple&amp;rsquo;s physical address. If a record is modified multiple times, multiple versions exist. These versions are linked via t_ctid, forming a version chain that can be followed to find the latest version.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;System Columns
 &lt;div id="system-columns" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#system-columns" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Every tuple has 6 system columns (directly queryable): &lt;code&gt;tableoid&lt;/code&gt;, &lt;code&gt;xmin&lt;/code&gt;, &lt;code&gt;xmax&lt;/code&gt;, &lt;code&gt;cmin&lt;/code&gt;, &lt;code&gt;cmax&lt;/code&gt;, &lt;code&gt;ctid&lt;/code&gt;. &lt;code&gt;tableoid&lt;/code&gt; is the table&amp;rsquo;s OID and doesn&amp;rsquo;t change during queries or DML. Here we focus on the remaining 5:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;616&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;619&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cmin&lt;/code&gt;: the command ID that inserted the tuple.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cmax&lt;/code&gt;: the command ID that deleted the tuple.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;xmin&lt;/code&gt;, &lt;code&gt;xmax&lt;/code&gt;, and &lt;code&gt;xvac&lt;/code&gt; are physically stored in &lt;code&gt;struct HeapTupleFields&lt;/code&gt;. But &lt;code&gt;cmin&lt;/code&gt; and &lt;code&gt;cmax&lt;/code&gt; are not separate fields — they are derived from &lt;code&gt;t_cid&lt;/code&gt; in the struct.&lt;/p&gt;
&lt;p&gt;The source for &lt;code&gt;cmin&lt;/code&gt; and &lt;code&gt;cmax&lt;/code&gt; is in &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* SetCmin is reasonably simple since we never need a combo CID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HeapTupleHeaderSetCmin(tup, cid) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;do { \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	Assert(!((tup)-&amp;gt;t_infomask &amp;amp; HEAP_MOVED)); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_choice.t_heap.t_field3.t_cid = (cid); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_infomask &amp;amp;= ~HEAP_COMBOCID; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;} while (0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* SetCmax must be used after HeapTupleHeaderAdjustCmax; see combocid.c */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HeapTupleHeaderSetCmax(tup, cid, iscombo) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;do { \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	Assert(!((tup)-&amp;gt;t_infomask &amp;amp; HEAP_MOVED)); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_choice.t_heap.t_field3.t_cid = (cid); \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	if (iscombo) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		(tup)-&amp;gt;t_infomask |= HEAP_COMBOCID; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	else \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		(tup)-&amp;gt;t_infomask &amp;amp;= ~HEAP_COMBOCID; \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;} while (0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * HeapTupleHeaderGetRawCommandId will give you what&amp;#39;s in the header whether
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * it is useful or not. Most code should use HeapTupleHeaderGetCmin or
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * HeapTupleHeaderGetCmax instead, but note that those Assert that you can
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * get a legitimate result, ie you are in the originating transaction!
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HeapTupleHeaderGetRawCommandId(tup) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;( \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	(tup)-&amp;gt;t_choice.t_heap.t_field3.t_cid \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Combo CID
 &lt;div id="combo-cid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#combo-cid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Before 8.3, &lt;code&gt;cmin&lt;/code&gt; and &lt;code&gt;cmax&lt;/code&gt; were separate. Later, considering that it&amp;rsquo;s rare for a single transaction to both insert and delete the same row, and that &lt;code&gt;cmin&lt;/code&gt;/&lt;code&gt;cmax&lt;/code&gt; are not needed after the transaction ends, the two were merged into a &amp;ldquo;combo command ID,&amp;rdquo; or &lt;code&gt;combocid&lt;/code&gt;, to save header space.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;combocid&lt;/code&gt; source: &lt;code&gt;src/backend/utils/time/combocid.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Key and entry structures for the hash table */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;typedef struct
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CommandId	cmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CommandId	cmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; ComboCidKeyData;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* comboid structure is cmin and cmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; CommandId
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;GetComboCommandId(CommandId cmin, CommandId cmax)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The hash table is only created the first time a combo cid is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (comboHash &lt;span style="color:#f92672"&gt;==&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HASHCTL		hash_ctl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* generate array and hash table */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	comboCids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (ComboCidKeyData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		MemoryContextAlloc(TopTransactionContext,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						 sizeof(ComboCidKeyData) &lt;span style="color:#f92672"&gt;*&lt;/span&gt; CCID_ARRAY_SIZE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sizeComboCids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; CCID_ARRAY_SIZE;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	usedComboCids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	memset(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;hash_ctl, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, sizeof(hash_ctl));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	comboHash &lt;span style="color:#f92672"&gt;=&lt;/span&gt; hash_create(&lt;span style="color:#e6db74"&gt;&amp;#34;Combo CIDs&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							CCID_HASH_SIZE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;hash_ctl,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							HASH_ELEM &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH_BLOBS &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH_CONTEXT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;combocid&lt;/code&gt; is stored in a hash table. The first time a transaction uses &lt;code&gt;combocid&lt;/code&gt;, a small block of memory is allocated to store it.&lt;/p&gt;
&lt;p&gt;So the relationship among these command IDs is: &lt;strong&gt;combocid → (cmin, cmax) → (t_cid, t_cid)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Simple Relationships Among Transaction IDs and System Columns
 &lt;div id="simple-relationships-among-transaction-ids-and-system-columns" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#simple-relationships-among-transaction-ids-and-system-columns" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;With all these IDs and source code, things might seem confusing. Here&amp;rsquo;s a diagram to help understand and remember the relationships among transaction IDs, command IDs, and tuple IDs:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/077888610817.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;A First Taste of Transactions
 &lt;div id="a-first-taste-of-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-first-taste-of-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Without any tools or extensions, let&amp;rsquo;s get a feel for how these system columns change during a transaction:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;622&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- after update, xmin+1, ctid+1; a new tuple appears
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;623&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- xmax records the rollback transaction ID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- xmin and ctid return to old values; the old tuple barely changes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;622&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;623&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- update again; tuple number jumps over 2 directly to 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,cmax,ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;624&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Tuple Header and Transactions
 &lt;div id="tuple-header-and-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tuple-header-and-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The pageinspect Extension
 &lt;div id="the-pageinspect-extension" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pageinspect-extension" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Simply looking at row changes won&amp;rsquo;t show old tuples. You need the pageinspect extension. pageinspect is a contrib module bundled with PostgreSQL that can display the detailed contents of data pages. To observe how tuples support transactions, we&amp;rsquo;ll use &lt;code&gt;get_raw_page()&lt;/code&gt; and &lt;code&gt;heap_page_items()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;get_raw_page()&lt;/code&gt;: returns the binary content of a specified block. The &lt;code&gt;fork&lt;/code&gt; parameter accepts &lt;code&gt;main&lt;/code&gt;, &lt;code&gt;fsm&lt;/code&gt;, &lt;code&gt;vm&lt;/code&gt;, or &lt;code&gt;init&lt;/code&gt;. &lt;code&gt;main&lt;/code&gt; is the main data file; &lt;code&gt;fsm&lt;/code&gt; is the free space map; &lt;code&gt;vm&lt;/code&gt; is the visibility map; &lt;code&gt;init&lt;/code&gt; is the initialization fork. Defaults to &lt;code&gt;main&lt;/code&gt; if not specified.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;heap_page_items()&lt;/code&gt;: displays all line pointers on a heap page, including rows invisible under MVCC.&lt;/p&gt;
&lt;p&gt;Generally, &lt;code&gt;get_raw_page()&lt;/code&gt; is passed as a parameter to &lt;code&gt;heap_page_items()&lt;/code&gt; to display tuple headers, pointers, and the data itself.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;heap_tuple_infomask_flags&lt;/code&gt;: converts decimal infomask/infomask2 values into their meanings (flags), outputting two columns: all individual flags and combined flags. (Infomask is covered later.)&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; extension pageinspect;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; EXTENSION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid,t_ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+--------+-------+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;633&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;lp (Line Pointer)
 &lt;div id="lp-line-pointer" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lp-line-pointer" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A line pointer is essentially a row pointer &lt;strong&gt;number&lt;/strong&gt; within a page, marking a tuple&amp;rsquo;s location. t_ctid looks more like a tuple ID, but ctid is simply the combination of (table page number, line pointer number). ctid can point to the next lp.&lt;/p&gt;
&lt;p&gt;For example, after one UPDATE, a new tuple is added. The new tuple&amp;rsquo;s lp number increments by 1, the old tuple&amp;rsquo;s ctid points to the new tuple&amp;rsquo;s lp, and the new tuple&amp;rsquo;s ctid points to itself:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,t_ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,t_ctid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lp source: &lt;code&gt;src/include/storage/itemid.h&lt;/code&gt;. The &lt;code&gt;ItemIdData&lt;/code&gt; struct stores the tuple&amp;rsquo;s offset, state, and length:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; ItemIdData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;unsigned&lt;/span&gt;	lp_off:&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;,		&lt;span style="color:#75715e"&gt;/* tuple offset within the page */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				lp_flags:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,		&lt;span style="color:#75715e"&gt;/* lp state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				lp_len:&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;;		&lt;span style="color:#75715e"&gt;/* tuple length */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} ItemIdData;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; ItemIdData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;ItemId;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* lp_off:15 is a bit-field; lp_off occupies 15 bits of the unsigned. The 3 fields together total 32 bits. So ItemIdData is an int, 4 bytes, 32 bits. */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;lp_flags&lt;/code&gt; defines 4 states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *lp_flags has these possible states. An UNUSED line pointer is available
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *for immediate re-use, the other states are not.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_UNUSED		0		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* lp not in use, tuple length lp_len always 0 */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_NORMAL		1		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* lp in use, tuple length lp_len always &amp;gt; 0 */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_REDIRECT		2		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* HOT redirect to another lp (should have lp_len=0) */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_DEAD			3		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* dead lp, vacuumable */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,lp_flags,lp_off,lp_len &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_off &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_len 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+----------+--------+--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8160&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Infomask
 &lt;div id="infomask" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Infomask provides information about transactions, locks, tuple state, etc. — such as committed, aborted, lock, HOT info, and more. There are two infomask fields in the header: &lt;code&gt;infomask&lt;/code&gt; and &lt;code&gt;infomask2&lt;/code&gt;. They store different information.&lt;/p&gt;

&lt;h4 class="relative group"&gt;infomask and infomask2
 &lt;div id="infomask-and-infomask2" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask-and-infomask2" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;infomask&lt;/code&gt; source is in &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint16		t_infomask2;	&lt;span style="color:#75715e"&gt;/* number of attributes + various flags */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint16		t_infomask;		&lt;span style="color:#75715e"&gt;/* various flag bits, see below */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;infomask Flag Meanings
 &lt;div id="infomask-flag-meanings" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask-flag-meanings" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * information stored in t_infomask:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASNULL			0x0001	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has null values */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASVARWIDTH		0x0002	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has variable-width attributes, e.g. varchar */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASEXTERNAL		0x0004	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has TOAST storage */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HASOID_OLD			0x0008	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has OID */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_KEYSHR_LOCK	0x0010	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has FOR KEY SHARE lock */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_COMBOCID			0x0020	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* t_cid is a combo CID */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_EXCL_LOCK		0x0040	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple has FOR UPDATE lock */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_LOCK_ONLY		0x0080	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax is only a locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;/* xmax is a shared locker */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_SHR_LOCK	(HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_LOCK_MASK	(HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_COMMITTED		0x0100	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_INVALID		0x0200	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting transaction invalid or aborted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_FROZEN		(HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_COMMITTED		0x0400	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_INVALID		0x0800	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting transaction invalid or aborted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_IS_MULTI		0x1000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* t_xmax is a MultiXactId */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_UPDATED			0x2000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* this is an updated version of a row */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_MOVED_OFF			0x4000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* moved elsewhere by pre-9.0 VACUUM FULL; kept for binary upgrade compatibility */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_MOVED_IN			0x8000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* moved from elsewhere, opposite of HEAP_MOVED_OFF; kept for compatibility */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XACT_MASK			0xFFF0	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* visibility-related bits */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;infomask2 Flag Meanings
 &lt;div id="infomask2-flag-meanings" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask2-flag-meanings" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_NATTS_MASK			0x07FF	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* 11 bits for the number of columns (MaxHeapAttributeNumber is 1600) */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* bits 0x1800 are available */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_KEYS_UPDATED		0x2000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple updated or deleted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_HOT_UPDATED		0x4000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* tuple updated, new tuple is HOT */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_ONLY_TUPLE			0x8000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* HOT tuple */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP2_XACT_MASK			0xE000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* visibility-related bits */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_TUPLE_HAS_MATCH	HEAP_ONLY_TUPLE 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* flag temporarily used in Hash Join, only for Hash table tuples that don&amp;#39;t need visibility info; we can reuse a visibility flag instead of a separate bit */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;infomask Bit Calculation
 &lt;div id="infomask-bit-calculation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#infomask-bit-calculation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Converting hex to binary makes it easier to understand the &lt;strong&gt;bit&lt;/strong&gt; meanings:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- convert hex 1600 to binary
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; x&lt;span style="color:#e6db74"&gt;&amp;#39;1600&amp;#39;&lt;/span&gt;::bit(&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; bit 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0001011000000000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;infomask:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000000001&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0001&lt;/span&gt; HEAP_HASNULL			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000000010&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0002&lt;/span&gt; HEAP_HASVARWIDTH		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000000100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0004&lt;/span&gt; HEAP_HASEXTERNAL		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000001000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0008&lt;/span&gt; HEAP_HASOID_OLD			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000010000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0010&lt;/span&gt; HEAP_XMAX_KEYSHR_LOCK	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000100000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0020&lt;/span&gt; HEAP_COMBOCID
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000001000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0040&lt;/span&gt; HEAP_XMAX_EXCL_LOCK
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000010000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0080&lt;/span&gt; HEAP_XMAX_LOCK_ONLY		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000001010000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0050&lt;/span&gt; HEAP_XMAX_SHR_LOCK bitwise OR: (HEAP_XMAX_EXCL_LOCK &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_XMAX_KEYSHR_LOCK)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000001010000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0050&lt;/span&gt; HEAP_LOCK_MASK bitwise OR: (HEAP_XMAX_SHR_LOCK &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_XMAX_EXCL_LOCK &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_XMAX_KEYSHR_LOCK)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000100000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0100&lt;/span&gt; HEAP_XMIN_COMMITTED		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000001000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0200&lt;/span&gt; HEAP_XMIN_INVALID		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000001100000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0300&lt;/span&gt; HEAP_XMIN_FROZEN bitwise OR: (HEAP_XMIN_COMMITTED&lt;span style="color:#f92672"&gt;|&lt;/span&gt;HEAP_XMIN_INVALID)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;300&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000010000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0400&lt;/span&gt; HEAP_XMAX_COMMITTED		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000100000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x0800&lt;/span&gt; HEAP_XMAX_INVALID		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0001000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x1000&lt;/span&gt; HEAP_XMAX_IS_MULTI		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0010000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x2000&lt;/span&gt; HEAP_UPDATED			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0100000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x4000&lt;/span&gt; HEAP_MOVED_OFF			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1000000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x8000&lt;/span&gt; HEAP_MOVED_IN			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1100000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xC000&lt;/span&gt; HEAP_MOVED bitwise OR: (HEAP_MOVED_OFF &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HEAP_MOVED_IN)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8000&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1111111111110000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xFFF0&lt;/span&gt; HEAP_XACT_MASK&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;infomask2:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000011111111111&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x07FF&lt;/span&gt; HEAP_NATTS_MASK PostgreSQL max columns is &lt;span style="color:#ae81ff"&gt;1600&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000011001000000&lt;/span&gt;, so &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; bits suffice &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; column count
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0001100000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x1800&lt;/span&gt; available bits, apparently unused
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0010000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x2000&lt;/span&gt; HEAP_KEYS_UPDATED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0100000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x4000&lt;/span&gt; HEAP_HOT_UPDATED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1000000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x8000&lt;/span&gt; HEAP_ONLY_TUPLE 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1110000000000000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0xE000&lt;/span&gt; HEAP2_XACT_MASK&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;How to Compute Infomask?
 &lt;div id="how-to-compute-infomask" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-to-compute-infomask" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Infomask flags are hexadecimal. pageinspect returns them as decimal. Use &lt;code&gt;to_hex()&lt;/code&gt; to convert from decimal to hexadecimal:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lp,t_ctid,to_hex(t_infomask) infomask,to_hex(t_infomask2) infomask2 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; infomask &lt;span style="color:#f92672"&gt;|&lt;/span&gt; infomask2 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+--------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;b00 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;infomask=2b00&lt;/code&gt; — still a bit opaque. Convert to binary and match against the flag meanings: &lt;code&gt;0010101100000000 = HEAP_UPDATED + HEAP_XMAX_INVALID + HEAP_XMIN_FROZEN&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Meaning: the tuple was updated, xmax is invalid (0), xmin is frozen (visible to all transactions).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;infomask2=1&lt;/code&gt; — the first 11 bits of binary (first 2047 in decimal, for up to 1600 columns) represent the number of user columns. So 1 means the tuple has only 1 column.&lt;/p&gt;
&lt;p&gt;Manually computing infomask is tedious. Starting from pg13, pageinspect provides the &lt;code&gt;heap_tuple_infomask_flags&lt;/code&gt; function to decode infomask and infomask2. Individual bits are shown as &lt;code&gt;raw_flags&lt;/code&gt;; combined multi-bit flags are shown as &lt;code&gt;combined_flags&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+------------------------------------------------------------------------+--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_FROZEN&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Commit Log (CLOG)
 &lt;div id="commit-log-clog" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#commit-log-clog" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL uses the commit log (CLOG) to store transaction status. PostgreSQL writes the transaction to WAL before completion — that&amp;rsquo;s what WAL means. If a transaction aborts, its status is written to both WAL and CLOG so that during instance recovery, PostgreSQL knows the transaction was not committed.&lt;/p&gt;
&lt;p&gt;When transaction status is needed — for example, when determining visibility — PostgreSQL reads the CLOG.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction status&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/access/clog.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_IN_PROGRESS		0x00
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_COMMITTED		0x01
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_ABORTED			0x02
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TRANSACTION_STATUS_SUB_COMMITTED	 0x03&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The CLOG defines four transaction states: &lt;code&gt;IN_PROGRESS&lt;/code&gt;, &lt;code&gt;COMMITTED&lt;/code&gt;, &lt;code&gt;ABORTED&lt;/code&gt;, &lt;code&gt;SUB_COMMITTED&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction status size&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/access/transam/clog.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* We need two bits per xact, so four xacts fit in a byte */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_BITS_PER_XACT	2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_BYTE 4
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define CLOG_XACT_BITMASK	((1 &amp;lt;&amp;lt; CLOG_BITS_PER_XACT) - 1)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Transaction status is very small — only 2 bits per transaction. One byte can store 4 transaction states. A standard page can hold &lt;code&gt;8K * 4 = 32,768&lt;/code&gt; transaction states.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CLOG persistence&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When PostgreSQL shuts down or checkpoints, CLOG data is written to the &lt;code&gt;pg_clog&lt;/code&gt; directory. In version 10.0 and later, &lt;code&gt;pg_clog&lt;/code&gt; was renamed to &lt;code&gt;pg_xact&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg_xact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; Mar &lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; 23:33 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On disk, CLOG files are named 0000, 0001, etc. CLOG files are 256KB in size, while in-memory pages storing transaction states are 8KB. So the 0000 file&amp;rsquo;s size will always be a multiple of 8192. After 32 CLOG pages are written, the next page goes into the 0001 file. PostgreSQL reads transaction states from &lt;code&gt;pg_xact&lt;/code&gt; into memory at startup.&lt;/p&gt;
&lt;p&gt;During system operation, not all transaction states need to be retained in CLOG files forever, so VACUUM periodically deletes no-longer-needed CLOG files.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Hint Bits
 &lt;div id="hint-bits" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hint-bits" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;What Are Hint Bits?
 &lt;div id="what-are-hint-bits" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-are-hint-bits" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Hint bits mark whether the transaction that created or deleted a row has committed or aborted. Without hint bits, determining transaction visibility requires accessing on-disk &lt;code&gt;pg_clog&lt;/code&gt; or &lt;code&gt;pg_subtrans&lt;/code&gt; — a relatively expensive operation. If a tuple has hint bits set, you can determine the tuple&amp;rsquo;s state just by reading the page — no extra access needed.&lt;/p&gt;
&lt;p&gt;The source code uses &lt;code&gt;SetHintBits()&lt;/code&gt; to set hint bits:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			InvalidTransactionId);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;SetHintBits&lt;/code&gt; only sets 2 bits in infomask, for 4 hint bit flags (these 2 bits also combine into &lt;code&gt;HEAP_XMIN_FROZEN&lt;/code&gt; — it&amp;rsquo;s clear that hint bits exist purely to mark transaction state):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_COMMITTED	0x0100	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting or updating transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMIN_INVALID		0x0200	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* inserting or updating transaction invalid or aborted */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_COMMITTED		0x0400	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting or updating transaction committed */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_INVALID		0x0800	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* deleting or updating transaction invalid or aborted */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Queries Can Cause Writes
 &lt;div id="queries-can-cause-writes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#queries-can-cause-writes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When a transaction starts, PostgreSQL DML transactions record the transaction ID and status (like t_xmin) in the tuple header. But when the transaction ends, nothing is done to the header. Instead, a subsequent DML, DQL, or VACUUM that scans the relevant tuple triggers &lt;code&gt;SetHintBits&lt;/code&gt; (this happens in &lt;code&gt;HeapTupleSatisfiesMVCC()&lt;/code&gt; when a new snapshot accesses data — we&amp;rsquo;ll cover visibility rules later).&lt;/p&gt;
&lt;p&gt;Before &lt;code&gt;SetHintBits&lt;/code&gt; is triggered, PostgreSQL looks up transaction status in the CLOG. After &lt;code&gt;SetHintBits&lt;/code&gt; is triggered, it reads the hint bits in the data page&amp;rsquo;s tuple header.&lt;/p&gt;
&lt;p&gt;For example, an INSERT statement:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1; &lt;span style="color:#75715e"&gt;-- just a single query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;a 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After one query, t_infomask changed — the tuple header changed.&lt;/p&gt;
&lt;p&gt;After INSERT, &lt;code&gt;SetHintBits&lt;/code&gt; only had &lt;code&gt;HEAP_XMAX_INVALID&lt;/code&gt;, because INSERT only updates xmin. Whether the transaction commits or aborts (exits or rolls back), xmax is unused and can be set to &lt;code&gt;HEAP_XMAX_INVALID&lt;/code&gt; along with the transaction.&lt;/p&gt;
&lt;p&gt;But the transaction may commit or abort (exit/rollback). Since transaction completion does not update the tuple, &lt;code&gt;HEAP_XMIN_COMMITTED&lt;/code&gt; cannot be set upon completion. During visibility checking (&lt;code&gt;heapam_visibility.c&lt;/code&gt;), the visibility check updates the transaction state by calling &lt;code&gt;SetHintBits&lt;/code&gt; on t_infomask. Thus, the query updated &lt;code&gt;HEAP_XMIN_COMMITTED&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hint bits advantage&lt;/strong&gt;: completing (or failing) data modifications in a transaction produces no writes to the tuple. Commit and rollback are very fast.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hint bits disadvantage&lt;/strong&gt;: if a transaction updates many rows, the next query performing visibility checks may need to read transaction states from pg_clog and update many pages.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Do Hint Bits Generate WAL?
 &lt;div id="do-hint-bits-generate-wal" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#do-hint-bits-generate-wal" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;When checksums are enabled or &lt;code&gt;wal_log_hints&lt;/code&gt; is true, if the first operation to make a page dirty after a checkpoint is updating hint bits, a WAL record is generated — specifically, a Full Page Image — to prevent partial writes that would cause checksum mismatches.&lt;/p&gt;
&lt;p&gt;Therefore, with checksums enabled or &lt;code&gt;wal_log_hints&lt;/code&gt; set to true, even a SELECT can modify page hint bits, which may generate WAL — increasing WAL storage to some extent. If you observe SELECT triggering disk writes, check whether CHECKSUM or &lt;code&gt;wal_log_hints&lt;/code&gt; is enabled.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Why Are Hint Bits Deferred?
 &lt;div id="why-are-hint-bits-deferred" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-are-hint-bits-deferred" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In &lt;code&gt;src/backend/access/heap/heapam_visibility.c&lt;/code&gt;, within the &lt;code&gt;HeapTupleSatisfiesMVCC()&lt;/code&gt; visibility function, a comment explains why hint bits are deferred:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*While insert/delete operations are still running, hint bits on tuples are not updated,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*even if the transaction has committed or aborted.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*In high-concurrency scenarios, sharing data structures can cause contention,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*and this doesn&amp;#39;t affect visibility decisions anyway.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*Hint bits are only set the first time a fresh snapshot accesses data after transaction completion.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*So HeapTupleSatisfiesMVCC always runs TransactionIdIsCurrentTransactionId and XidInMVCCSnapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*to determine whether the tuple belongs to the current transaction.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*In older versions, PostgreSQL tried to update hint bits immediately (even while transactions were running),
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*but this caused more contention on the PGXACT array.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;*/&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Simply put: immediate hint bit updates perform very poorly. So transaction status is first stored in CLOG to reduce PGXACT contention and improve performance. Deferred hint bits are why later queries may update tuple headers.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Tuple DML Operations
 &lt;div id="tuple-dml-operations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tuple-dml-operations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve built up knowledge of tuple headers, system columns, CLOG, and hint bits, let&amp;rsquo;s see how PostgreSQL performs INSERT, UPDATE, and DELETE.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Observing DML Transactions
 &lt;div id="observing-dml-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#observing-dml-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;We&amp;rsquo;ll observe PostgreSQL&amp;rsquo;s DML transaction behavior by examining tuple header fields: lp, lp_flags, ctid, xmin, xmax, cid (cmin, cmax), infomask, and infomask2.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ll use the following query:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(A side note: some sources like to write &lt;code&gt;SELECT '(0,'||lp||')' AS ctid&lt;/code&gt;. This is misleading — lp and ctid are different things. lp is like a row number; ctid points to a line pointer number. lp can be different from ctid.)&lt;/p&gt;
&lt;p&gt;For readability, create a view:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;view&lt;/span&gt; vlzl1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now the query looks like:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;x
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Expanded display &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;--+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;653&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;INSERT
 &lt;div id="insert" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#insert" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Truncate the table, then insert a row:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;ctid points to (page 0, lp 1), i.e., to itself.&lt;/li&gt;
&lt;li&gt;lp (line pointer number) increments.&lt;/li&gt;
&lt;li&gt;Both tuples share the same xmin — they were inserted by the same transaction.&lt;/li&gt;
&lt;li&gt;xmax is 0 (invalid transaction ID). Infomask only indicates xmax is invalid: this tuple has not yet &amp;ldquo;experienced&amp;rdquo; a delete transaction.&lt;/li&gt;
&lt;li&gt;cid increments from 0: 0 for the first command, 1 for the second.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;DELETE
 &lt;div id="delete" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#delete" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;665&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The first tuple was deleted. The tuple wasn&amp;rsquo;t physically removed — only a few attributes were marked:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ctid unchanged, still points to itself.&lt;/li&gt;
&lt;li&gt;xmax updated to the delete transaction ID.&lt;/li&gt;
&lt;li&gt;Infomask shows &lt;code&gt;HEAP_KEYS_UPDATED&lt;/code&gt;, indicating the tuple was deleted (actually, &lt;code&gt;HEAP_KEYS_UPDATED&lt;/code&gt; means either deleted or updated).&lt;/li&gt;
&lt;li&gt;Although only the first tuple was modified, the second tuple&amp;rsquo;s infomask was also updated with &lt;code&gt;HEAP_XMIN_COMMITTED&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;UPDATE
 &lt;div id="update" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#update" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-------------------------------------------------------------+----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;665&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;664&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;666&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;666&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;An UPDATE doesn&amp;rsquo;t modify the tuple in place. Instead, it marks the old tuple as unavailable and inserts a new one:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;lp=2 is the old tuple from the update transaction. t_xmax is the update transaction ID. Infomask adds &lt;code&gt;HEAP_HOT_UPDATED&lt;/code&gt;, indicating the tuple is HOT. ctid points to the new tuple.&lt;/li&gt;
&lt;li&gt;lp=3 is the new tuple from the update. It&amp;rsquo;s equivalent to an inserted tuple, but xmin matches the old tuple&amp;rsquo;s xmax. Infomask has the extra flag &lt;code&gt;HEAP_UPDATED&lt;/code&gt;, indicating this is the updated version.&lt;/li&gt;
&lt;li&gt;Additionally, the invisible deleted tuple at lp=1 had its infomask updated with &lt;code&gt;HEAP_XMAX_COMMITTED&lt;/code&gt; by an unrelated subsequent update transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Rollback
 &lt;div id="rollback" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rollback" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;truncate&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TRUNCATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- INSERT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;679&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- INSERT rolled back
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+---------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;679&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After INSERT and rollback, the tuple header shows no changes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 ; &lt;span style="color:#75715e"&gt;-- DELETE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;686&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- DELETE rolled back
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;686&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_KEYS_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After DELETE and rollback, the tuple header shows no changes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; ; &lt;span style="color:#75715e"&gt;-- UPDATE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+--------------------------------------------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- UPDATE rolled back
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+--------------------------------------------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;684&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;685&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;688&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After UPDATE and rollback, the tuple header shows no changes.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;When a transaction rolls back, tuple information does not change at all. This is why PostgreSQL&amp;rsquo;s MVCC doesn&amp;rsquo;t worry about running out of rollback segments — rollback is purely a visibility operation, not a data update.&lt;/li&gt;
&lt;li&gt;xmax doesn&amp;rsquo;t change after rollback either, which means a non-zero xmax doesn&amp;rsquo;t necessarily indicate the tuple was deleted — the delete or update transaction may have rolled back.&lt;/li&gt;
&lt;li&gt;However, once visibility checking occurs, even without data changes, all tuples&amp;rsquo; infomask will be updated with &lt;code&gt;HEAP_XMIN_INVALID&lt;/code&gt;. Non-HOT tuples get &lt;code&gt;HEAP_XMIN_INVALID&lt;/code&gt;, and HOT-referenced tuples naturally get it too.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;References for Tuple and Transaction
 &lt;div id="references-for-tuple-and-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references-for-tuple-and-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL in Action&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Internals: Deep Dive into Transaction Processing&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Database Kernel Analysis&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf" target="_blank" rel="noreferrer"&gt;https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Concurrency_control" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Concurrency_control&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Hint_Bits" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Hint_Bits&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/10/storage-page-layout.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/10/storage-page-layout.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pageinspect.html3" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pageinspect.html3&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Essential PostgreSQL transaction reads (interdb):&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Source code experts:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/102920988" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/102920988&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/127955762" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/127955762&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/125023923" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/125023923&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL snapshot optimization performance comparison:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/postgres-atomicity" target="_blank" rel="noreferrer"&gt;https://brandur.org/postgres-atomicity&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Snapshots in PostgreSQL
 &lt;div id="snapshots-in-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshots-in-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A snapshot is a data structure that records the instantaneous state of the database. PostgreSQL&amp;rsquo;s snapshot stores: the minimum and maximum transaction IDs among all active transactions, the list of currently active transactions, the current transaction&amp;rsquo;s command ID, and more.&lt;/p&gt;
&lt;p&gt;Snapshot data is stored in the &lt;code&gt;SnapshotData&lt;/code&gt; struct type. Source: &lt;code&gt;src/include/utils/snapshot.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; SnapshotData
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	SnapshotType snapshot_type; &lt;span style="color:#75715e"&gt;/* snapshot type */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId xmin;			&lt;span style="color:#75715e"&gt;/* txid &amp;lt; xmin are visible to the snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId xmax;			&lt;span style="color:#75715e"&gt;/* txid &amp;gt;= xmax are invisible to the snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* list of active transactions at snapshot time. Only includes txids between xmin and xmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;xip;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;uint32		xcnt;			&lt;span style="color:#75715e"&gt;/* xip_list stored in xip[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* list of active subtransactions at snapshot time */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;subxip;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;int32		subxcnt;		&lt;span style="color:#75715e"&gt;/* subtransactions stored in subxip[] */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		suboverflowed;	&lt;span style="color:#75715e"&gt;/* whether subtransactions overflowed; overflows occur with many subtransactions */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		takenDuringRecovery;	&lt;span style="color:#75715e"&gt;/* is this a recovery snapshot? */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		copied;			&lt;span style="color:#75715e"&gt;/* whether the snapshot is a copy (RR and serializable copy their snapshots); false if static */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CommandId	curcid;			&lt;span style="color:#75715e"&gt;/* command ID in the transaction; CID &amp;lt; curcid is visible */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TimestampTz whenTaken;		&lt;span style="color:#75715e"&gt;/* timestamp when snapshot was taken */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogRecPtr	lsn;			&lt;span style="color:#75715e"&gt;/* LSN when snapshot was taken */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} SnapshotData;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; SnapshotData &lt;span style="color:#f92672"&gt;*&lt;/span&gt;Snapshot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The most important snapshot information is &lt;code&gt;xmin&lt;/code&gt;, &lt;code&gt;xmax&lt;/code&gt;, and &lt;code&gt;xip_list&lt;/code&gt;. Use &lt;code&gt;pg_current_snapshot()&lt;/code&gt; (in pg12 and earlier, &lt;code&gt;txid_current_snapshot()&lt;/code&gt;) to display the current transaction&amp;rsquo;s snapshot.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note: snapshot xmin/xmax are different from tuple xmin/xmax — they have different meanings.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_current_snapshot();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_current_snapshot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;104&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;102&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;xmin&lt;/th&gt;
 &lt;th&gt;Earliest active txid. All txids older than xmin have either committed (visible) or aborted (dead tuples).&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;xmax&lt;/td&gt;
 &lt;td&gt;First unassigned txid. xmax = latestCompletedXid + 1. All txid &amp;gt;= xmax have not yet started and are invisible to the current snapshot.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;xip_list&lt;/td&gt;
 &lt;td&gt;Stored in array xip[]. Since transactions can start and finish out of order (a later-started transaction may finish earlier), xmin and xmax alone cannot fully express all active transactions at snapshot time. xip_list stores the active transactions at snapshot time.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b7605604abbc.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Snapshot Types
 &lt;div id="snapshot-types" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-types" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Beyond MVCC snapshots, PostgreSQL defines several other snapshot types in &lt;code&gt;src/include/utils/snapshot.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt; SnapshotType
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Tuple is visible if and only if it satisfies MVCC snapshot visibility rules.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The most important snapshot type — used to implement MVCC.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Tuple visibility is judged based on snapshot xmin, xmax, xip_list, curcid, etc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * If a command changed data, the current MVCC snapshot won&amp;#39;t see it; a new MVCC snapshot is needed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_MVCC &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Tuple is visible if its transaction committed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In-progress transactions are invisible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Data changes from the current command are visible to the SELF snapshot.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_SELF,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Any tuple is visible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_ANY,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Visible if the TOAST tuple is valid. TOAST visibility depends on the main table tuple&amp;#39;s visibility.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_TOAST,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Data changes from the current command are visible to the DIRTY snapshot.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The DIRTY snapshot preserves version info for in-progress tuples.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Snapshot xmin is set to the xmin of other in-progress transactions&amp;#39; tuples; xmax is similar.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_DIRTY,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* HISTORIC_MVCC snapshot follows MVCC rules, used for logical decoding.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_HISTORIC_MVCC,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; Determines whether dead tuples are visible to certain transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SNAPSHOT_NON_VACUUMABLE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} SnapshotType;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Snapshots and Isolation Levels
 &lt;div id="snapshots-and-isolation-levels" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshots-and-isolation-levels" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Different isolation levels acquire snapshots differently:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ab95b43529f1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Read Committed requires a new snapshot for each SQL statement in the transaction, while Repeatable Read uses only one snapshot for the entire transaction. The function that acquires snapshots is &lt;code&gt;GetTransactionSnapshot()&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Process-Level Transaction Structures
 &lt;div id="process-level-transaction-structures" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-level-transaction-structures" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When PostgreSQL acquires snapshot data, it needs to scan the transaction state of all backend processes.&lt;/p&gt;
&lt;p&gt;Before understanding the &lt;code&gt;GetSnapshotData()&lt;/code&gt; function, we need to understand several backend process structures: PGPROC, PGXACT, PROC_HDR (PROCGLOBAL), and ProcArray.&lt;/p&gt;
&lt;p&gt;These process-related structures contain process and lock information. Here we only study the transaction-related parts. Source code examples are based on pg13.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PGPROC Struct
 &lt;div id="pgproc-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pgproc-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/proc.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Every backend process stores a PGPROC struct in memory.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Think of this as the backend process&amp;#39;s main structure.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; PGPROC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LocalTransactionId lxid;	&lt;span style="color:#75715e"&gt;/* local id of top-level transaction currently
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * being executed by this proc, if running;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * else InvalidLocalTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; XidCache subxids;	&lt;span style="color:#75715e"&gt;/* cached subtransaction XIDs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* clog group transaction status update */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		clogGroupMember;	&lt;span style="color:#75715e"&gt;/* whether this proc uses clog group commit */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_atomic_uint32 clogGroupNext; &lt;span style="color:#75715e"&gt;/* atomic int, pointing to the next group member proc */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TransactionId clogGroupMemberXid;	&lt;span style="color:#75715e"&gt;/* xid to be committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XidStatus	clogGroupMemberXidStatus;	&lt;span style="color:#75715e"&gt;/* status of the xid to be committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			clogGroupMemberPage;	&lt;span style="color:#75715e"&gt;/* which page the xid to be committed belongs to */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;XLogRecPtr	clogGroupMemberLsn; &lt;span style="color:#75715e"&gt;/* LSN of the commit log for the xid to be committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* NOTE: &amp;#34;typedef struct PGPROC PGPROC&amp;#34; appears in storage/lock.h. Not written with the struct itself. */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;PGXACT Struct
 &lt;div id="pgxact-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pgxact-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Before 9.2, PGXACT information was inside PGPROC. Stress testing showed that on multi-CPU systems,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// separating them makes GetSnapshotData faster by reducing the number of cache lines fetched.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; PGXACT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xid;			&lt;span style="color:#75715e"&gt;/* id of top-level transaction currently being
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * executed by this proc, if running and XID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 * is assigned; else InvalidTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								&lt;span style="color:#75715e"&gt;// appears to be the current process&amp;#39;s xmax
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xmin;			&lt;span style="color:#75715e"&gt;/* excluding lazy vacuum; minimum xid at transaction start;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;								 vacuum cannot remove tuples with xid &amp;gt;= xmin */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		vacuumFlags;	&lt;span style="color:#75715e"&gt;/* vacuum-related flags, see above */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		overflowed; &lt;span style="color:#75715e"&gt;// whether PGXACT overflowed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	uint8		nxids;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} PGXACT;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PGXACT stores relatively simple information — the backend&amp;rsquo;s xmin, xmax, and other transaction-related fields. &lt;strong&gt;PGPROC leans toward storing basic backend info; some less frequently accessed transaction info remains in PGPROC, but the core process transaction info is in PGXACT.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;PROC_HDR (PROCGLOBAL) Struct
 &lt;div id="proc_hdr-procglobal-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#proc_hdr-procglobal-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Every backend process has a proc struct. In high-concurrency scenarios, scanning all proc structs to find transaction info is time-consuming. An instance-level structure is needed to store all proc info — this is PROCGLOBAL.&lt;/p&gt;
&lt;p&gt;The source typically uses the struct type &lt;code&gt;PROC_HDR&lt;/code&gt; to define a struct pointer to PROCGLOBAL. PROC_HDR stores global proc info: the full array of proc structs, free procs, etc.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/proc.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; PROC_HDR
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* pgproc array (not including dummies for prepared txns) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;allProcs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* pgxact array (not including dummies for prepared txns) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PGXACT	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;allPgXact;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Current shared estimate of appropriate spins_per_delay value */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			spins_per_delay;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* The proc of the Startup process, since not in ProcArray */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;startupProc;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			startupProcPid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Buffer id of the buffer that Startup process waits for pin on, or -1 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			startupBufferPinWaitBufId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} PROC_HDR;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;ProcArray Struct
 &lt;div id="procarray-struct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#procarray-struct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;ProcArray is in &lt;code&gt;procarray.c&lt;/code&gt;, which maintains the PGPROC and PGXACT structures for all backends.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/storage/ipc/procarray.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; ProcArrayStruct
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			numProcs;		&lt;span style="color:#75715e"&gt;/* number of procs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			maxProcs;		&lt;span style="color:#75715e"&gt;/* size of proc array */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// handling assigned xids
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			maxKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* allocated size of array */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			numKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* current # of valid entries */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			tailKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* index of oldest valid element */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			headKnownAssignedXids;	&lt;span style="color:#75715e"&gt;/* index of newest element, + 1 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;slock_t&lt;/span&gt;		known_assigned_xids_lck;	&lt;span style="color:#75715e"&gt;/* protects head/tail pointers */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Highest subxid that has been removed from KnownAssignedXids array to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * prevent overflow; or InvalidTransactionId if none. We track this for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * similar reasons to tracking overflowing cached subxids in PGXACT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * entries. Must hold exclusive ProcArrayLock to change this, and shared
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * lock to read it.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId lastOverflowedXid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* oldest xmin of any replication slot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* oldest catalog xmin of any replication slot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_catalog_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* pgprocnos, equivalent to allPgXact[] array indices, used to look up allPgXact[]; this array has PROCARRAY_MAXPROCS entries */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pgprocnos[FLEXIBLE_ARRAY_MEMBER];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} ProcArrayStruct;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; ProcArrayStruct &lt;span style="color:#f92672"&gt;*&lt;/span&gt;procArray;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Acquiring a Snapshot
 &lt;div id="acquiring-a-snapshot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#acquiring-a-snapshot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;GetTransactionSnapshot()
 &lt;div id="gettransactionsnapshot" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#gettransactionsnapshot" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Snapshots are acquired via &lt;code&gt;GetTransactionSnapshot()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/utils/time/snapmgr.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// GetTransactionSnapshot() allocates the appropriate snapshot for SQL in a transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Snapshot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetTransactionSnapshot&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#75715e"&gt;// Return historic snapshot if doing logical decoding. We&amp;#39;ll never need a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#75715e"&gt;// non-historic snapshot after this, so return directly.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HistoricSnapshotActive&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;FirstSnapshotSet);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; HistoricSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* If it&amp;#39;s not the first call in this transaction, enter this if */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;FirstSnapshotSet)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Ensure the catalog snapshot is fresh.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;InvalidateCatalogSnapshot&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;pairingheap_is_empty&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;RegisteredSnapshots));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(FirstXactSnapshot &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Return error if in parallel mode
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsInParallelMode&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(ERROR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#e6db74"&gt;&amp;#34;cannot take query snapshot during a parallel operation&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 &lt;span style="color:#75715e"&gt;// For Repeatable Read or Serializable, use the same snapshot for the entire transaction; only copy once
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 &lt;span style="color:#75715e"&gt;// IsolationUsesXactSnapshot() means the isolation level is RR or Serializable — they use one snapshot per transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsolationUsesXactSnapshot&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// First, create the snapshot in CurrentSnapshotData
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If SI isolation level, initialize SSI-required data structures
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsolationIsSerializable&lt;/span&gt;()) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSerializableTransactionSnapshot&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Make a saved copy */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* For Repeatable Read or Serializable, this snapshot lasts the entire transaction; copy once */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CopySnapshot&lt;/span&gt;(CurrentSnapshot);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			FirstXactSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Mark it as &amp;#34;registered&amp;#34; in FirstXactSnapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			FirstXactSnapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;regd_count&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;pairingheap_add&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;RegisteredSnapshots, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;FirstXactSnapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;ph_node);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// For Read Committed, acquire a snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Modify flag to indicate this is the first snapshot; subsequent calls in this transaction won&amp;#39;t enter this if
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		FirstSnapshotSet &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// If not the first call in this transaction (already have a first snapshot)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// For Repeatable Read or Serializable, return a copy of the first snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsolationUsesXactSnapshot&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Don&amp;#39;t allow catalog snapshot to be older than xact snapshot. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;InvalidateCatalogSnapshot&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Read Committed: re-acquire snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CurrentSnapshot &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;CurrentSnapshotData);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; CurrentSnapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;About &lt;code&gt;IsolationUsesXactSnapshot()&lt;/code&gt; and &lt;code&gt;IsolationIsSerializable()&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;Defined as macros in &lt;code&gt;src/include/access/xact.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_READ_UNCOMMITTED	0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_READ_COMMITTED	1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_REPEATABLE_READ	2
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define XACT_SERIALIZABLE	3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Internally only 3 isolation levels: 1, 2, 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// 2 isolation levels use one snapshot per transaction; others use one snapshot per SQL statement
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IsolationUsesXactSnapshot() (XactIsoLevel &amp;gt;= XACT_REPEATABLE_READ)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IsolationIsSerializable() (XactIsoLevel == XACT_SERIALIZABLE)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;IsolationUsesXactSnapshot()&lt;/code&gt; is true for Repeatable Read or Serializable.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;IsolationIsSerializable()&lt;/code&gt; is true for Serializable only.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;GetTransactionSnapshot()&lt;/code&gt; flow chart:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/578be2dea323.png" alt="image" /&gt;
(image from CSDN: &lt;a href="https://blog.csdn.net/Hehuyi_In" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The main logic of &lt;code&gt;GetTransactionSnapshot()&lt;/code&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For historic snapshots during logical decoding, return the snapshot result directly.&lt;/li&gt;
&lt;li&gt;For Repeatable Read or Serializable: on the first call, return the snapshot and copy it so subsequent calls (non-first) can directly reference it.&lt;/li&gt;
&lt;li&gt;For Read Committed: generate a new snapshot on every call.&lt;/li&gt;
&lt;li&gt;For the first call in Serializable, additionally acquire SSI data information.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GetTransactionSnapshot()&lt;/code&gt; acquires the snapshot; the actual data comes from &lt;code&gt;GetSnapshotData()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;GetSnapshotData()
 &lt;div id="getsnapshotdata" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#getsnapshotdata" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/storage/ipc/procarray.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Snapshot
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetSnapshotData&lt;/span&gt;(Snapshot snapshot)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Initialize some variables: arrayP pointer, procarray, xmin, xmax, replication slot txid, etc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ProcArrayStruct &lt;span style="color:#f92672"&gt;*&lt;/span&gt;arrayP &lt;span style="color:#f92672"&gt;=&lt;/span&gt; procArray;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId globalxmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			index;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			count &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			subcount &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; InvalidTransactionId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId replication_slot_catalog_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; InvalidTransactionId;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(snapshot &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xip &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * First call for this snapshot. Snapshot is same size whether or not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * we are in recovery, see later comments.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xip &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// get current transaction&amp;#39;s xip
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;GetMaxSnapshotXidCount&lt;/span&gt;() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(TransactionId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip &lt;span style="color:#f92672"&gt;==&lt;/span&gt; NULL);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (TransactionId &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#75715e"&gt;// get current subtransaction&amp;#39;s subxip
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;malloc&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;GetMaxSnapshotSubxidCount&lt;/span&gt;() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(TransactionId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Acquire procarray; need shared LWLock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(ProcArrayLock, LW_SHARED);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* xmax = max completed xid + 1 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; ShmemVariableCache&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;latestCompletedXid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(xmax));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;TransactionIdAdvance&lt;/span&gt;(xmax); &lt;span style="color:#75715e"&gt;// xmax + 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* xmax value retrieved; xmin needs scanning pgproc, pgxact, procarray */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Set globalxmin and xmin to xmax first; if backends have no transaction info, this is simpler */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	globalxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmax; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Recovery snapshots handled separately
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;takenDuringRecovery &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;RecoveryInProgress&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Non-recovery snapshots need transaction info from backends
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;takenDuringRecovery)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;		 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pgprocnos &lt;span style="color:#f92672"&gt;=&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;pgprocnos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			numProcs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Spin over procArray checking xid, xmin, and subxids. The goal is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * to gather all active xids, find the lowest xmin, and try to record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * subxids. It appears that while scanning procarray, it will spin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * to collect all active xids, the smallest xmin, and subtransaction subxids.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		numProcs &lt;span style="color:#f92672"&gt;=&lt;/span&gt; arrayP&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;numProcs;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (index &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; index &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; numProcs; index&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			pgprocno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pgprocnos[index]; &lt;span style="color:#75715e"&gt;// iterate numProcs, get all pgprocno indices
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			PGXACT	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;pgxact &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allPgXact[pgprocno]; &lt;span style="color:#75715e"&gt;// iterate all pgxact structs via pgprocno
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Update globalxmin to be the smallest valid xmin */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			xid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;UINT32_ACCESS_ONCE&lt;/span&gt;(pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(xid) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;NormalTransactionIdPrecedes&lt;/span&gt;(xid, globalxmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				globalxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Fetch xid just once - see GetNewTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			xid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;UINT32_ACCESS_ONCE&lt;/span&gt;(pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Save backend&amp;#39;s xmin into snapshot xip */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* i.e., iterate all pgxact to find all active xids */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xip[count&lt;span style="color:#f92672"&gt;++&lt;/span&gt;] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Subtransaction info handling */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;suboverflowed) &lt;span style="color:#75715e"&gt;// if subtransaction hasn&amp;#39;t overflowed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;overflowed)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// if transaction overflowed, mark subtransaction as overflowed too
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			nxids &lt;span style="color:#f92672"&gt;=&lt;/span&gt; pgxact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nxids;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (nxids &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						PGPROC	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;proc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;allProcs[pgprocno];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;pg_read_barrier&lt;/span&gt;();	&lt;span style="color:#75715e"&gt;/* pairs with GetNewTransactionId */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;memcpy&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip &lt;span style="color:#f92672"&gt;+&lt;/span&gt; subcount,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 (&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) proc&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxids.xids,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							 nxids &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(TransactionId));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						subcount &lt;span style="color:#f92672"&gt;+=&lt;/span&gt; nxids;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#75715e"&gt;// the else corresponds to if (!snapshot-&amp;gt;takenDuringRecovery)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// These checks are for standby; when the instance is in hot standby mode and queries run on the replica
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		subcount &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;KnownAssignedXidsGetAndSetXmin&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxip, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;xmin,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;												 xmax);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdPrecedesOrEquals&lt;/span&gt;(xmin, procArray&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lastOverflowedXid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Replication slot xmin and catalog cluster-wide xmin, first save to local variables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Replication slot xmin prevents tuple reclamation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// The comment says this is to avoid holding ProcArrayLock for too long, so save to local variables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	replication_slot_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; procArray&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;replication_slot_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	replication_slot_catalog_xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; procArray&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;replication_slot_catalog_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Backend transaction info gathering is done; below is a series of ifs for cleanup and code robustness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(MyPgXact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		MyPgXact&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; TransactionXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(ProcArrayLock); &lt;span style="color:#75715e"&gt;// release ProcArrayLock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdPrecedes&lt;/span&gt;(xmin, globalxmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		globalxmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin; &lt;span style="color:#75715e"&gt;// globalxmin and process xmin: assign globalxmin to the smaller one
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; globalxmin &lt;span style="color:#f92672"&gt;-&lt;/span&gt; vacuum_defer_cleanup_age;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(RecentGlobalXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; FirstNormalTransactionId; &lt;span style="color:#75715e"&gt;// edge case: if RecentGlobalXmin &amp;lt;= 2, assign 3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Check whether there&amp;#39;s a replication slot requiring an older xmin. */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(replication_slot_xmin) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;NormalTransactionIdPrecedes&lt;/span&gt;(replication_slot_xmin, RecentGlobalXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; replication_slot_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Non-catalog tables can be vacuumed if older than this xid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RecentGlobalDataXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; RecentGlobalXmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Re-check and compare catalog, globalxmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsNormal&lt;/span&gt;(replication_slot_catalog_xmin) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;NormalTransactionIdPrecedes&lt;/span&gt;(replication_slot_catalog_xmin, RecentGlobalXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		RecentGlobalXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; replication_slot_catalog_xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	RecentXmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Start assigning values to the snapshot struct, returning snapshot data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmin &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;xcnt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; count;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;subxcnt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; subcount;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;suboverflowed &lt;span style="color:#f92672"&gt;=&lt;/span&gt; suboverflowed;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetCurrentCommandId&lt;/span&gt;(false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If it&amp;#39;s a new snapshot, initialize some snapshot info
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;active_count &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;regd_count &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;copied &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Snapshot-too-old logic below; oddly written here
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (old_snapshot_threshold &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If not using &amp;#34;snapshot too old&amp;#34; feature, fill related fields with
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * dummy values that don&amp;#39;t require any locking.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When old_snapshot_threshold &amp;lt; 0 (no &amp;#34;snapshot too old&amp;#34; issue)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// assign simple constant values that won&amp;#39;t require any locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lsn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; InvalidXLogRecPtr;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;whenTaken &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When old_snapshot_threshold &amp;gt;= 0, need to handle old snapshot logic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lsn &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetXLogInsertRecPtr&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// get LSN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;whenTaken &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;GetSnapshotCurrentTimestamp&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// get snapshot timestamp
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;MaintainOldSnapshotTimeMapping&lt;/span&gt;(snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;whenTaken, xmin); &lt;span style="color:#75715e"&gt;//
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// GetXLogInsertRecPtr(), GetSnapshotCurrentTimestamp(), MaintainOldSnapshotTimeMapping() 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// all contain SpinLockAcquire and SpinLockRelease
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// MaintainOldSnapshotTimeMapping() also has LWLockAcquire and LWLockRelease
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Since this is called for every snapshot, GetSnapshotData should be very frequent
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// So in pg13 source, setting old_snapshot_threshold to negative avoids many spinlocks and lwlocks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; snapshot;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;pg14 Snapshot Optimizations
 &lt;div id="pg14-snapshot-optimizations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg14-snapshot-optimizations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;pg14 Optimization Source Analysis
 &lt;div id="pg14-optimization-source-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg14-optimization-source-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;From the pg13 source, we can see that &lt;code&gt;GetSnapshotData()&lt;/code&gt; hardcodes &lt;code&gt;old_snapshot_threshold &amp;gt;= 0&lt;/code&gt;, causing each snapshot acquisition to incur many &lt;code&gt;SpinLock&lt;/code&gt; and &lt;code&gt;LWLock&lt;/code&gt; operations. Since snapshot acquisition is extremely frequent, this inevitably causes performance issues. So pg14 simply removed the &lt;code&gt;old_snapshot_threshold&lt;/code&gt; logic from &lt;code&gt;GetSnapshotData()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Beyond that removal, pg14 made many other optimizations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Removed &lt;code&gt;RecentGlobalXmin&lt;/code&gt; and &lt;code&gt;RecentGlobalDataXmin&lt;/code&gt;, added the &lt;code&gt;GlobalVisTest*&lt;/code&gt; family of functions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Introduced the &lt;strong&gt;boundaries&lt;/strong&gt; concept with two boundaries: &lt;code&gt;definitely_needed&lt;/code&gt; and &lt;code&gt;maybe_needed&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; GlobalVisState
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* XIDs &amp;gt;= are considered running by some backend */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// rows with XID &amp;gt;= definitely_needed are definitely visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FullTransactionId definitely_needed;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* XIDs &amp;lt; are not considered to be running by any backend */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// rows with XID &amp;lt; maybe_needed can definitely be cleaned up
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FullTransactionId maybe_needed;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Added &lt;code&gt;ComputeXidHorizons()&lt;/code&gt; for more precise horizon calculation (storing xmin and removable xid information). This function still needs to iterate PGPROC. The calculation range is &lt;code&gt;XID &amp;gt;= maybe_needed &amp;amp;&amp;amp; XID &amp;lt; definitely_needed&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Added &lt;code&gt;GlobalVisTestShouldUpdate()&lt;/code&gt; to determine whether boundaries need recalculation.&lt;/p&gt;
&lt;p&gt;First, understand the variable &lt;code&gt;ComputeXidHorizonsResultLastXmin&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; TransactionId ComputeXidHorizonsResultLastXmin; &lt;span style="color:#75715e"&gt;// last precisely computed xmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GlobalVisTestShouldUpdate&lt;/span&gt;(GlobalVisState &lt;span style="color:#f92672"&gt;*&lt;/span&gt;state)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// If xmin=0, need to recalculate boundaries. This is an edge case for tuples created during database initialization.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(ComputeXidHorizonsResultLastXmin))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * If the maybe_needed/definitely_needed boundaries are the same, it&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * unlikely to be beneficial to refresh boundaries.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When maybe_needed equals definitely_needed, no need to recalculate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Uses FullTransactionIdFollowsOrEquals (not strict equality)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// &amp;#34;Greater than&amp;#34; scenario: no rows definitely visible. &amp;#34;Equal&amp;#34; scenario: only one row definitely visible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;FullTransactionIdFollowsOrEquals&lt;/span&gt;(state&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;maybe_needed,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 state&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;definitely_needed))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* does the last snapshot built have a different xmin? */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// When the last snapshot&amp;#39;s xmin equals the last precisely computed xmin, no need to recalculate boundaries
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; RecentXmin &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; ComputeXidHorizonsResultLastXmin;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We can see that &lt;code&gt;maybe_needed&lt;/code&gt; and &lt;code&gt;definitely_needed&lt;/code&gt; are similar to snapshot xmin/xmax, but with an additional layer of computation. First calculate boundaries, then further refine with &lt;code&gt;ComputeXidHorizons()&lt;/code&gt;. &lt;code&gt;GlobalVisTestShouldUpdate&lt;/code&gt; reduces the scenarios where boundaries need recalculation, and &lt;code&gt;ComputeXidHorizons()&lt;/code&gt; is also more efficient for precise calculation.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Optimization Results
 &lt;div id="optimization-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#optimization-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Recommended article on PostgreSQL snapshot optimization:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The before-and-after comparison is striking:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a346095be5a7.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In pg13 production environments, &lt;code&gt;GetSnapshotData&lt;/code&gt; consistently shows high performance overhead. (No screenshot, so I&amp;rsquo;ll borrow another expert&amp;rsquo;s chart:)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8cd67db0e65f.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Snapshot References
 &lt;div id="snapshot-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL in Action&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Internals: Deep Dive into Transaction Processing&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Database Kernel Analysis&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf" target="_blank" rel="noreferrer"&gt;https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Concurrency_control" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Concurrency_control&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Hint_Bits" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Hint_Bits&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/10/storage-page-layout.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/10/storage-page-layout.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pageinspect.html3" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pageinspect.html3&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Essential PostgreSQL transaction reads (interdb):&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Source code experts:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/102920988" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/102920988&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/127955762" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/127955762&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/125023923" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/125023923&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL snapshot optimization performance comparison:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/postgres-atomicity" target="_blank" rel="noreferrer"&gt;https://brandur.org/postgres-atomicity&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Visibility Checking
 &lt;div id="visibility-checking" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#visibility-checking" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;With a snapshot, we can determine tuple visibility. Let&amp;rsquo;s review the key information (ignoring subtransactions for now): tuple header transaction info, snapshot info, and CLOG transaction status (before SetHintBits).&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tuple header has: xmin, xmax, cmin, cmax, infomask, etc.&lt;/li&gt;
&lt;li&gt;Snapshot data has: snapshot xmin, xmax, xip_list, curcid, etc.&lt;/li&gt;
&lt;li&gt;CLOG has additional transaction status info, which may also be written to infomask as hint bits.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Different snapshot types have slightly different visibility rules:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesVisibility&lt;/span&gt;(HeapTuple tup, Snapshot snapshot, Buffer buffer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;switch&lt;/span&gt; (snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;snapshot_type)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SNAPSHOT_MVCC:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesMVCC&lt;/span&gt;(tup, snapshot, buffer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; SNAPSHOT_NON_VACUUMABLE:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesNonVacuumable&lt;/span&gt;(tup, snapshot, buffer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Each snapshot type has its own visibility rules. Here we&amp;rsquo;ll use the most common &lt;code&gt;SNAPSHOT_MVCC&lt;/code&gt; visibility rules to understand tuple visibility.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleSatisfiesMVCC&lt;/span&gt;(HeapTuple htup, Snapshot snapshot,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Buffer buffer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	HeapTupleHeader tuple &lt;span style="color:#f92672"&gt;=&lt;/span&gt; htup&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_data; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;ItemPointerIsValid&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;htup&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_self)); &lt;span style="color:#75715e"&gt;// lp valid, i.e., tuple valid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(htup&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_tableOid &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; InvalidOid); &lt;span style="color:#75715e"&gt;// oid valid, i.e., table valid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// t_xmin not committed: the transaction that INSERTed or UPDATEd this new tuple has not committed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// In htup_details.h, macro: HeapTupleHeaderXminCommitted() is ((tup)-&amp;gt;t_infomask &amp;amp; HEAP_XMIN_COMMITTED) != 0
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// So if (!HeapTupleHeaderXminCommitted(tuple)) means the tuple infomask does not have HEAP_XMIN_COMMITTED
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Literally: t_xmin has not committed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderXminCommitted&lt;/span&gt;(tuple)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If a transaction updated the tuple but then aborted or failed, this tuple&amp;#39;s xmin is the failed transaction ID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the inserting transaction failed, directly return invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderXminInvalid&lt;/span&gt;(tuple))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When infomask has HEAP_MOVED_OFF, visibility is judged separately for VACUUM tuples, with hint bits set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Used by pre-9.0 binary upgrades */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_MOVED_OFF)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId xvac &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetXvac&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(xvac, snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When infomask has HEAP_MOVED_IN, visibility is judged separately for VACUUM tuples, with hint bits set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Used by pre-9.0 binary upgrades */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_MOVED_IN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			TransactionId xvac &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetXvac&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(xvac, snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(xvac))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// When the tuple was written by the current transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmin&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid) &lt;span style="color:#75715e"&gt;// tuple cid &amp;gt;= snapshot current command id
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;// tuple was inserted after visibility check started; invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_INVALID) &lt;span style="color:#75715e"&gt;// infomask has HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true; &lt;span style="color:#75715e"&gt;// tuple not deleted; visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// A pure insert, whether committed, not yet committed, or rolled back, has HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// But this check is under the &amp;#34;written by current transaction&amp;#34; condition, so:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// Tuple inserted by current transaction, not committed (logically equivalent to not deleted within the same tx),
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// and t_cid &amp;lt; curcid → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// xmax is set in two scenarios: 1) tuple locked, 2) tuple deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Even without HEAP_XMAX_INVALID, the tuple may not be deleted — it may just be locked
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Locked tuples have xmax set but are visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask))	&lt;span style="color:#75715e"&gt;/* not deleter */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// HEAP_XMAX_IS_MULTI is set when multiple transactions acquire locks on the same row, producing MultiXactId
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Still judging visibility under xmax lock scenarios
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_IS_MULTI)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				TransactionId xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleGetUpdateXid&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* not LOCKED_ONLY, so it has to have an xmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(xmax));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* updating subtransaction must have aborted */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// If xmax is not the current transaction, visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xmax))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// If xmax is the current transaction, judge by command id:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;// snapshot acquired before update/delete → tuple was visible at snapshot time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* updated after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* updated before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// The following scenario: a subtransaction&amp;#39;s delete command was rolled back, need SetHintBits HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Delete rolled back, so tuple is visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* deleting subtransaction must have aborted */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMAX_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;							InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// cmax is the command ID that deleted the tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If tuple cmax &amp;gt;= snapshot curcid: delete happened after snapshot scan → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If tuple cmax &amp;lt; snapshot curcid: delete happened before snapshot scan → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* deleted after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* deleted before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// XidInMVCCSnapshot() checks if xid was in-progress at snapshot time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// &amp;#34;in-progress&amp;#34; means: 1. snapshot xmin &amp;lt;= xid &amp;lt; snapshot xmax AND xid in xip_list 2. xid &amp;gt;= snapshot xmax
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// The xid below is t_xmin
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// So this means: if t_xmin was in-progress at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Equivalent to: t_xmin not committed → invisible. This seems redundant.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Because this whole block is under !HeapTupleHeaderXminCommitted(tuple) — also meaning t_xmin not committed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// But with the preceding checks, this else if is reasonable. Meaning:
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// t_xmin not committed, tuple not deleted, not current transaction → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If t_xmin transaction committed, SetHintBits HEAP_XMIN_COMMITTED
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// This seems odd: the entire block is for t_xmin NOT committed, how could it be committed here?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// And if this case really happens, why no visibility judgment?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If t_xmin transaction did not commit, SetHintBits HEAP_XMIN_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* it must have aborted or crashed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMIN_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// t_xmin transaction not committed, return invisible again. Similar to XidInMVCCSnapshot() above?
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Currently: not committed, and doesn&amp;#39;t satisfy XidInMVCCSnapshot() (xid was not in-progress at snapshot time)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// The only case: transaction hadn&amp;#39;t started at snapshot time, later started, still not committed → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// xmin-not-committed visibility judgments finally done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Everything after the else is for when xmin IS committed (hint bit HEAP_XMIN_COMMITTED is set)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// xmin is committed, but maybe not according to our snapshot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* xmin is committed, but maybe not according to our snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If infomask has no HEAP_XMIN_FROZEN AND xmin was in-progress at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Translating the if: at snapshot time, xmin was not committed; at visibility check time,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// tuple xmin is committed but not marked FROZEN → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Even though tuple xmin is now committed, from the current snapshot&amp;#39;s perspective it was still in-progress
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderXminFrozen&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmin&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;		&lt;span style="color:#75715e"&gt;/* treat as still in progress */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// HEAP_XMAX_INVALID means tuple not deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// This if means: tuple committed, and was committed at snapshot time, and not deleted (no delete marker at all) → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_INVALID)	&lt;span style="color:#75715e"&gt;/* xid invalid or aborted */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Tuple has xmax, but it&amp;#39;s not a delete — it&amp;#39;s a lock marker
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// This if means: tuple committed, was committed at snapshot time, has xmax but xmax is a lock → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// HEAP_XMAX_IS_MULTI means the tuple is in shared-row-lock state, typically when multiple transactions process one row
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_IS_MULTI)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		TransactionId xmax;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* already checked above */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Get the transaction ID that updated the tuple
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		xmax &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;HeapTupleGetUpdateXid&lt;/span&gt;(tuple);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* not LOCKED_ONLY, so it has to have an xmax */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;TransactionIdIsValid&lt;/span&gt;(xmax));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the shared-row-lock tuple&amp;#39;s transaction ID is the current transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(xmax))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// tuple cmax &amp;gt;= snapshot curcid: tuple not yet deleted at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* deleted after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// tuple cmax &amp;lt; snapshot curcid: tuple already deleted at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* deleted before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the shared-row-lock tuple&amp;#39;s transaction ID is not the current transaction, and xmax was in-progress at snapshot time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// This if means: xmin committed, tuple not deleted, MULTI XMAX marker present, xmax not yet committed at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(xmax, snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If the shared-row-lock tuple transaction committed → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(xmax))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;		&lt;span style="color:#75715e"&gt;/* updating transaction committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* it must have aborted or crashed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Updating transaction aborted or crashed → still visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Tuple xmin committed, xmax not yet marked committed, not yet deleted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Seems !HEAP_XMAX_COMMITTED differs from HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// This looks like: tuple experienced a delete, but the delete transaction hasn&amp;#39;t committed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// While HEAP_XMAX_INVALID above is: definitely no delete or delete aborted/rolled back, so can directly return true
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;(tuple&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;t_infomask &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; HEAP_XMAX_COMMITTED))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If xmax is the same as the checking transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;TransactionIdIsCurrentTransactionId&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Same old pattern: visibility via command id
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// cmax &amp;gt;= snapshot curcid: delete happened after snapshot → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetCmax&lt;/span&gt;(tuple) &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; snapshot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;curcid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;	&lt;span style="color:#75715e"&gt;/* deleted after scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// cmax &amp;lt; snapshot curcid: delete happened before snapshot → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;	&lt;span style="color:#75715e"&gt;/* deleted before scan started */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Delete transaction not committed, and xmax not the checking transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If xmax was in-progress at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Confirm xmax delete transaction aborted or failed; SetHintBits HEAP_XMAX_INVALID
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Similar to HEAP_XMAX_INVALID above → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;TransactionIdDidCommit&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* it must have aborted or crashed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMAX_INVALID,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						InvalidTransactionId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* xmax transaction committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Remaining case: xmax delete transaction committed. SetHintBits HEAP_XMAX_COMMITTED
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Visibility should be judged here, but it&amp;#39;s deferred to the last few lines, because this is a sub-case of a larger condition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SetHintBits&lt;/span&gt;(tuple, buffer, HEAP_XMAX_COMMITTED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;					&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* xmax is committed, but maybe not according to our snapshot */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// xmax delete transaction now committed, but was in-progress at snapshot time → visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;XidInMVCCSnapshot&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;HeapTupleHeaderGetRawXmax&lt;/span&gt;(tuple), snapshot))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;		&lt;span style="color:#75715e"&gt;/* treat as still in progress */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* xmax transaction committed */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;// Only remaining case: xmax committed and was not in-progress at snapshot time → invisible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The entire visibility judgment source code looks complex. Stripping out the &lt;code&gt;SetHintBits&lt;/code&gt; parts and the convoluted if-else chains, focusing only on the core visibility rules, the key points are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Core visibility rule logic:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Delete committed → tuple invisible&lt;/li&gt;
&lt;li&gt;Insert committed, delete rolled back → tuple visible&lt;/li&gt;
&lt;li&gt;Insert committed, delete not committed → current transaction compares cid; other transactions see the tuple as visible&lt;/li&gt;
&lt;li&gt;Insert rolled back → tuple invisible&lt;/li&gt;
&lt;li&gt;Insert not committed → same transaction compares cmin; other transactions see the tuple as invisible&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Visibility checking involves two time points: the check time and the snapshot time. The logic distinguishes between the same transaction (checking transaction = snapshot transaction) and different transactions.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Same transaction&lt;/strong&gt;: compare tuple cmin/cmax against &lt;code&gt;snapshot-&amp;gt;curcid&lt;/code&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;cmin &amp;gt;= snapshot-&amp;gt;curcid&lt;/code&gt;: tuple inserted after snapshot → invisible. Otherwise visible.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;cmax &amp;gt;= snapshot-&amp;gt;curcid&lt;/code&gt;: tuple deleted after snapshot → visible. Otherwise invisible.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Different transactions&lt;/strong&gt;: use &lt;code&gt;XidInMVCCSnapshot()&lt;/code&gt; to check whether xid (t_xmin or t_xmax) was in-progress at snapshot time.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;xmin was in-progress at snapshot time → invisible.&lt;/li&gt;
&lt;li&gt;xmax was in-progress at snapshot time → visible.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Beyond basic DML operations, there are 4 additional cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;VACUUM tuple insert/delete visibility&lt;/li&gt;
&lt;li&gt;Lock-only marker (&lt;code&gt;HEAP_XMAX_IS_LOCKED_ONLY&lt;/code&gt;): tuple visible&lt;/li&gt;
&lt;li&gt;MultiXact state (&lt;code&gt;HEAP_XMAX_IS_MULTI&lt;/code&gt;): visibility for tuples under multi-transaction locks&lt;/li&gt;
&lt;li&gt;Frozen tuples: visibility when frozen marker is set&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;MultiXact
 &lt;div id="multixact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What Is MultiXact?
 &lt;div id="what-is-multixact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-multixact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When multiple transactions lock the same row, there may be multiple associated transaction IDs on the tuple. PostgreSQL groups multiple transaction IDs together and manages them with a single &lt;code&gt;MultiXactId&lt;/code&gt;. The relationship between TransactionId and MultiXactId is many-to-one.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ee67ad9bb95b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Like TransactionId, MultiXactId is also 32-bit and also subject to wraparound.&lt;/p&gt;
&lt;p&gt;MultiXactId values 0 and 1 are reserved for system use. Allocatable MultiXactIds start from 2.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Source: src&lt;span style="color:#f92672"&gt;/&lt;/span&gt;include&lt;span style="color:#f92672"&gt;/&lt;/span&gt;access&lt;span style="color:#f92672"&gt;/&lt;/span&gt;multixact.h
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define InvalidMultiXactId	((MultiXactId) 0)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define FirstMultiXactId	((MultiXactId) 1)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MaxMultiXactId		((MultiXactId) 0xFFFFFFFF)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Row Lock Types
 &lt;div id="row-lock-types" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#row-lock-types" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;MultiXact only exists when rows are locked. MultiXact defines 6 states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;enum&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForKeyShare &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x00&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForShare &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x01&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForNoKeyUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x02&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusForUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x03&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* an update that doesn&amp;#39;t touch &amp;#34;key&amp;#34; columns */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusNoKeyUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x04&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* other updates, and delete */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MultiXactStatusUpdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0x05&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} MultiXactStatus;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are 4 explicitly declarable row lock states: &lt;code&gt;ForKeyShare&lt;/code&gt;, &lt;code&gt;ForShare&lt;/code&gt;, &lt;code&gt;ForNoKeyUpdate&lt;/code&gt;, &lt;code&gt;ForUpdate&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;MultiXact Infomask Flags
 &lt;div id="multixact-infomask-flags" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact-infomask-flags" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL marks row locks on xmax and records them in infomask.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/access/htup_details.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_KEYSHR_LOCK	0x0010	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax is a key-shared locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_EXCL_LOCK		0x0040	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax is exclusive locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_LOCK_ONLY		0x0080	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* xmax, if valid, is only a locker */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_SHR_LOCK	(HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_LOCK_MASK	(HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;						 HEAP_XMAX_KEYSHR_LOCK)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define HEAP_XMAX_IS_MULTI		0x1000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* t_xmax is a MultiXactId */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Here we focus on the &lt;code&gt;HEAP_XMAX_IS_MULTI&lt;/code&gt; flag. Only when multiple transactions hold shared locks on the same row is a true MultiXact ID generated and this flag set.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- initially one row
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+----------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;742&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Session 1&lt;/th&gt;
 &lt;th&gt;Session 2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;lzldb=# begin; &lt;br /&gt; BEGIN &lt;br /&gt;lzldb=*# select * from lzl1 for share; &lt;br /&gt;a &lt;br /&gt;&amp;mdash; &lt;br /&gt;1&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;lzldb=# begin; &lt;br /&gt; BEGIN &lt;br /&gt;lzldb=*# select * from lzl1 for share;&lt;br /&gt;a &lt;br /&gt;&amp;mdash; &lt;br /&gt;1&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;lzldb=*# update lzl1 set a=2; &amp;ndash;hang&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;commit；&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UPDATE 1 &amp;ndash;update completed&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check tuple xmax and infomask
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,t_xmin,t_xmax,(t_infomask&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt;)&lt;span style="color:#f92672"&gt;!=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; is_multixact &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; is_multixact 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+--------+--------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;742&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;744&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;HEAP_XMAX_IS_MULTI&lt;/code&gt; is &lt;code&gt;0x1000&lt;/code&gt; in hex, which is 4096 in decimal. Using &lt;code&gt;(t_infomask&amp;amp;4096)!=0 is_multixact&lt;/code&gt; shows whether the tuple uses a MultiXact ID. From the example:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MultiXact IDs have their own value space, separate from transaction IDs.&lt;/li&gt;
&lt;li&gt;MultiXact IDs are generally smaller than transaction IDs — here t_xmax &amp;lt; t_xmin.&lt;/li&gt;
&lt;li&gt;For an UPDATE, old and new tuples typically share the same xmax. In MultiXact scenarios, they may differ.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;MultiXact SLRU
 &lt;div id="multixact-slru" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact-slru" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Although &lt;code&gt;src/backend/access/transam/multixact.c&lt;/code&gt; defines many variables and functions at the top — &lt;code&gt;page&lt;/code&gt;, &lt;code&gt;member&lt;/code&gt;, &lt;code&gt;membergroup&lt;/code&gt;, &lt;code&gt;offset&lt;/code&gt; — they are all about defining variable values and conversion functions between them.&lt;/p&gt;
&lt;p&gt;Before reading &lt;code&gt;multixact.c&lt;/code&gt;, understand a few macros:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;src/include/c.h&lt;/code&gt; defines &lt;code&gt;MultiXactOffset&lt;/code&gt; as a 32-bit type:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; uint32 MultiXactOffset;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;src/include/access/slru.h&lt;/code&gt; defines how many SLRU pages per segment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SLRU_PAGES_PER_SEGMENT	32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Back to the top of &lt;code&gt;src/backend/access/transam/multixact.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;define &lt;span style="color:#a6e22e"&gt;MULTIXACT_OFFSETS_PER_PAGE&lt;/span&gt; (BLCKSZ &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(MultiXactOffset)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// MULTIXACT_OFFSETS_PER_PAGE = 8k / 32B = 2048. One page stores 2048 offset markers, i.e., 2048 MultiXactIds.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MultiXactIdToOffsetPage(xid) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	((xid) / (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Convert xid to the page where the corresponding record resides: xid / 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MultiXactIdToOffsetEntry(xid) \
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	((xid) % (MultiXactOffset) MULTIXACT_OFFSETS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Convert xid to the offset within the page: xid % 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define MultiXactIdToOffsetSegment(xid) (MultiXactIdToOffsetPage(xid) / SLRU_PAGES_PER_SEGMENT)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Convert xid to the segment: xid / 2048 / 32
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now read the comments at the top of the source:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Defines for MultiXactOffset page sizes. A page is the same BLCKSZ as is
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * used everywhere else in Postgres.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Note: because MultiXactOffsets are 32 bits and wrap around at 0xFFFFFFFF,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * MultiXact page numbering also wraps around at
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE, and segment numbering at
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * 0xFFFFFFFF/MULTIXACT_OFFSETS_PER_PAGE/SLRU_PAGES_PER_SEGMENT. We need
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * take no explicit notice of that fact in this module, except when comparing
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * segment and page numbers in TruncateMultiXact (see
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * MultiXactOffsetPagePrecedes).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since &lt;code&gt;MultiXactOffsets&lt;/code&gt; are 32-bit and subject to wraparound:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MultiXact page numbering wraps at &lt;code&gt;0xFFFFFFFF / MULTIXACT_OFFSETS_PER_PAGE = 2^32 / 2048 = 2^21&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Segment numbering wraps at &lt;code&gt;0xFFFFFFFF / MULTIXACT_OFFSETS_PER_PAGE / SLRU_PAGES_PER_SEGMENT = 2^32 / 2^11 / 2^5 = 2^16&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;TruncateMultiXact()&lt;/code&gt; cleans up these segments and page numbers. It is called by VACUUM.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The pg_multixact Directory
 &lt;div id="the-pg_multixact-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_multixact-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Like CLOG and SUBTRANS, MultiXact logs use an SLRU buffer pool implementation. The &lt;code&gt;pg_multixact&lt;/code&gt; directory has only two subdirectories: &lt;code&gt;members&lt;/code&gt; and &lt;code&gt;offsets&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;pg@lzl pg_multixact&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;drwx------ &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; 21:29 members
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;drwx------ &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; pg pg &lt;span style="color:#ae81ff"&gt;4096&lt;/span&gt; Feb &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; 21:29 offsets&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;One MultiXactId corresponds to multiple TransactionIds — the members. The offset is the starting position of each MultiXact.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/39f86a3494b8.png" alt="image" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; mXactCacheEnt
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MultiXactId multi; &lt;span style="color:#75715e"&gt;// one MultiXactId
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;		nmembers;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	dlist_node	node;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MultiXactMember members[FLEXIBLE_ARRAY_MEMBER]; &lt;span style="color:#75715e"&gt;// multiple TransactionIds; expanded via MultiXactIdExpand() if needed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} mXactCacheEnt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;multixact.h&lt;/code&gt; defines &lt;code&gt;MultiXactMember&lt;/code&gt; as just a single transaction ID and its status:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;typedef&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; MultiXactMember
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	TransactionId xid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	MultiXactStatus status;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} MultiXactMember;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;MultiXact References
 &lt;div id="multixact-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multixact-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pgpedia.info/m/multixact-id.html" target="_blank" rel="noreferrer"&gt;https://pgpedia.info/m/multixact-id.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/15/explicit-locking.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/15/explicit-locking.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.modb.pro/db/14939" target="_blank" rel="noreferrer"&gt;https://www.modb.pro/db/14939&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2020/06/12/transactions-in-postgresql-and-their-mechanism/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2020/06/12/transactions-in-postgresql-and-their-mechanism/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Two-Phase Commit (2PC) Transactions
 &lt;div id="two-phase-commit-2pc-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#two-phase-commit-2pc-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What Is a 2PC Transaction?
 &lt;div id="what-is-a-2pc-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-2pc-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Transaction atomicity requires that a transaction either completes entirely or rolls back entirely. In distributed transactions spanning multiple connected databases, a consistent state must be provided to satisfy distributed transaction atomicity. Like other databases, PostgreSQL provides the Two-Phase Commit Protocol (2PC).&lt;/p&gt;
&lt;p&gt;There are many distributed transaction implementations; 2PC is the most fundamental and common. Distributed transactions encompass atomic commit, atomic visibility, and global consistency. 2PC is only an implementation for atomic commit.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PREPARE TRANSACTION
 &lt;div id="prepare-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#prepare-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Foreign Data Wrappers (FDWs) can handle 2PC internally. PostgreSQL also provides an explicit way to use 2PC: &lt;code&gt;PREPARE TRANSACTION&lt;/code&gt;. Once issued, the prepared transaction is detached from the session; its state is persisted. &lt;code&gt;PREPARE TRANSACTION&lt;/code&gt; is not designed for use in applications or interactive sessions — unless you&amp;rsquo;re writing a transaction manager — so it is recommended (and default) to keep it disabled.&lt;/p&gt;
&lt;p&gt;Syntax:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; transaction_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; PREPARED transaction_id 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; PREPARED transaction_id&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;transaction_id&lt;/code&gt; here is not the internal transaction ID — it&amp;rsquo;s just a user-declared string.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;PREPARE TRANSACTION&lt;/code&gt; must be inside a transaction block, started with &lt;code&gt;BEGIN&lt;/code&gt; or &lt;code&gt;START TRANSACTION&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;max_prepared_transactions&lt;/code&gt; controls the number of prepared transactions. Default is 0 (disabled). Must be increased to use prepared transactions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Starting a Prepared Transaction
 &lt;div id="starting-a-prepared-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#starting-a-prepared-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_xacts ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; gid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepared &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----+-------------------------------+-------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;719&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;866022&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt; prepared &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; PREPARED 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_xacts ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; gid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepared &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----+----------+-------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;The pg_twophase Directory
 &lt;div id="the-pg_twophase-directory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_twophase-directory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;As mentioned, prepared transactions are session-independent. When a prepared transaction is started, its state information is stored in a cache. To ensure the transaction is not lost, prepared transactions are also persisted to the &lt;code&gt;pg_twophase&lt;/code&gt; directory. This doesn&amp;rsquo;t only happen on shutdown — it&amp;rsquo;s tied to &lt;code&gt;checkpoint&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/access/transam/twophase.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CheckPointTwoPhase&lt;/span&gt;(XLogRecPtr redo_horizon)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_START&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// checkpoint start
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;fsync_fname&lt;/span&gt;(TWOPHASE_DIR, true); &lt;span style="color:#75715e"&gt;// call fsync to flush to disk
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;TRACE_POSTGRESQL_TWOPHASE_CHECKPOINT_DONE&lt;/span&gt;(); &lt;span style="color:#75715e"&gt;// checkpoint done
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s test: start a prepared transaction and run a checkpoint:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl pg_twophase]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CHECKPOINT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[pg&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzl pg_twophase]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 pg pg 116 Apr 29 16:33 000002D0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Orphaned Prepared Transactions
 &lt;div id="orphaned-prepared-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#orphaned-prepared-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If a prepared transaction is never completed (neither committed nor rolled back), and since it is session-independent, it will persist unless explicitly terminated. (Normally, a regular transaction rolls back when the session disconnects.) This is an &lt;strong&gt;orphaned prepared transaction&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Orphaned prepared transactions hold locks and tuple resources indefinitely, preventing VACUUM from reclaiming dead tuples and even blocking transaction ID wraparound. For example, if a prepared transaction is forgotten and not committed or rolled back, and there is no external transaction management monitoring it, it may go unnoticed and exist forever — ultimately causing severe problems. Therefore, it&amp;rsquo;s recommended to keep &lt;code&gt;max_prepared_transactions=0&lt;/code&gt; (default) or monitor prepared transactions via the &lt;code&gt;pg_prepared_xacts&lt;/code&gt; view.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a simulation of an orphaned prepared transaction causing indefinite blocking:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Start a prepared transaction and disconnect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;q
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After disconnecting, the prepared transaction still exists
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_xacts ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;transaction&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; gid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepared &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----+-------------------------------+-------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;721&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;597678&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DDL blocked
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b int;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,relation,pid,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32808&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+----------+-------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32808&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26136&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;32808&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- End the prepared transaction; DDL completes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt; prepared &lt;span style="color:#e6db74"&gt;&amp;#39;lzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; PREPARED
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b int;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;2PC Transaction References
 &lt;div id="2pc-transaction-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2pc-transaction-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="http://postgres.cn/docs/13/sql-prepare-transaction.html" target="_blank" rel="noreferrer"&gt;http://postgres.cn/docs/13/sql-prepare-transaction.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2020/01/28/understanding-prepared-transactions-and-handling-the-orphans/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2020/01/28/understanding-prepared-transactions-and-handling-the-orphans/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Atomic_Commit_of_Distributed_Transactions&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Subtransactions
 &lt;div id="subtransactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;What Is a Subtransaction?
 &lt;div id="what-is-a-subtransaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-subtransaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A regular transaction can only commit or roll back as a whole. Subtransactions allow partial rollback.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;SAVEPOINT p1&lt;/code&gt; places a savepoint marker inside a transaction. You cannot directly commit a subtransaction — subtransactions are committed when the parent transaction commits. However, you can use &lt;code&gt;ROLLBACK TO SAVEPOINT p1&lt;/code&gt; to roll back to that savepoint.&lt;/p&gt;
&lt;p&gt;Subtransactions are useful for bulk data loading. If a transaction contains multiple subtransactions and one small segment fails, only that segment needs to be retried — not the entire transaction.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Using Subtransactions in SQL
 &lt;div id="using-subtransactions-in-sql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-subtransactions-in-sql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT savepoint_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt; [ &lt;span style="color:#66d9ef"&gt;WORK&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRANSACTION&lt;/span&gt; ] &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; [ SAVEPOINT ] savepoint_name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;RELEASE [ SAVEPOINT ] savepoint_name&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Savepoint statements must be inside a transaction block.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;SAVEPOINT&lt;/code&gt; creates a savepoint; &lt;code&gt;ROLLBACK TO&lt;/code&gt; rolls back to the named savepoint; &lt;code&gt;RELEASE&lt;/code&gt; erases the savepoint without rolling back subtransaction data.&lt;/li&gt;
&lt;li&gt;Cursors are not affected by savepoint operations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; savepoint p3;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SAVEPOINT
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; savepoint p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; xmin,xmax,cmin,a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; cmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+------+------+---
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;731&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;732&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Rolling back to p2 also rolled back p3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; vlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;731&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;732&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;733&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;734&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASNULL,HEAP_XMIN_INVALID,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subtransaction infomask is not very different from regular transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Multiple commands within the same transaction are differentiated by cid and HEAP_XMIN_INVALID, etc.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Subtransaction writes also consume transaction IDs, and cid increments within the parent transaction framework.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Other Sources of Subtransactions
 &lt;div id="other-sources-of-subtransactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#other-sources-of-subtransactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Even without explicit &lt;code&gt;SAVEPOINT&lt;/code&gt;, subtransactions can be created by other means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;EXCEPTION&lt;/code&gt; blocks trigger subtransactions. This is common in tools and frameworks and easily overlooked. Every &lt;code&gt;EXCEPTION&lt;/code&gt; creates a subtransaction.&lt;/p&gt;
&lt;p&gt;Syntax: &lt;code&gt;BEGIN / EXCEPTION WHEN .. / END&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Reference: &lt;a href="https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html" target="_blank" rel="noreferrer"&gt;https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;PL/Python code using &lt;code&gt;plpy.subtransaction()&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Subtransaction SLRU Cache
 &lt;div id="subtransaction-slru-cache" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransaction-slru-cache" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Subtransaction commit logs are in &lt;code&gt;pg_xact&lt;/code&gt;. Parent-child relationships are stored in &lt;code&gt;pg_subtrans&lt;/code&gt;, which caches the mapping of subXID to parent XID. When PostgreSQL needs to look up a subXID, it calculates which memory page the ID resides on and searches within that page. If the page is not in cache, it evicts a page and loads the required page from &lt;code&gt;pg_subtrans&lt;/code&gt; into memory. Large numbers of subtransaction cache misses consume system I/O and CPU.&lt;/p&gt;
&lt;p&gt;The subtransaction buffer is only 32 pages, hardcoded in the source.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/access/subtrans.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* Number of SLRU buffers to use for subtrans */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\#&lt;/span&gt;define NUM_SUBTRANS_BUFFERS &lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Buffer default is 8KB; xid is 32 bits (4 bytes). Therefore:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SUBTRANS_BUFFER size: &lt;code&gt;32 * 8K = 256KB&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;SUBTRANS_BUFFER can store at most: &lt;code&gt;32 * 8K / 4 = 65,536&lt;/code&gt; xids&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f1a4a6d13c77.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Finding a subtransaction&amp;rsquo;s position in a page by transaction ID:&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/backend/access/transam/subtrans.c&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/* We need four bytes per xact */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define SUBTRANS_XACTS_PER_PAGE (BLCKSZ / sizeof(TransactionId))
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Each page can store up to 8K / 4 bytes = 2048 subtransaction IDs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToPage(xid) ((xid) / (TransactionId) SUBTRANS_XACTS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Calculate page number from subtransaction xid: xid / 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define TransactionIdToEntry(xid) ((xid) % (TransactionId) SUBTRANS_XACTS_PER_PAGE)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;// Calculate offset within page from subtransaction xid: xid % 2048
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Subtransaction xids may not be densely packed within a page — a page may hold fewer than 2048 subtransaction IDs.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Dangers of Subtransactions
 &lt;div id="the-dangers-of-subtransactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-dangers-of-subtransactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;1. PGPROC_MAX_CACHED_SUBXIDS Overflow&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PGPROC_MAX_CACHED_SUBXIDS&lt;/code&gt; is not a GUC parameter — it&amp;rsquo;s hardcoded. You can only change it by modifying the source.&lt;/p&gt;
&lt;p&gt;Source: &lt;code&gt;src/include/storage/proc.h&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*Each backend has a subtransaction cache limit of PGPROC_MAX_CACHED_SUBXIDS.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*We must track whether the cache has overflowed (i.e., the transaction has at least one subtransaction that couldn&amp;#39;t be cached).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*If no cache has overflowed, we can be sure that an xid not in the PGPROC array is definitely not a running transaction.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*If there is an overflow, we must consult pg_subtrans.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	*/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;#define PGPROC_MAX_CACHED_SUBXIDS 64	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* XXX guessed-at value */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; XidCache
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		TransactionId xids[PGPROC_MAX_CACHED_SUBXIDS];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Two key takeaways from this source:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Every backend&amp;rsquo;s subtransaction cache is capped at &lt;code&gt;PGPROC_MAX_CACHED_SUBXIDS&lt;/code&gt;: 64 subtransactions.&lt;/li&gt;
&lt;li&gt;Beyond 64 subtransactions, they overflow to the &lt;code&gt;pg_subtrans&lt;/code&gt; directory.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;An expert&amp;rsquo;s benchmark: performance drops when subtransactions just exceed 64. So it&amp;rsquo;s best to keep per-session subtransactions below 64.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ac0dc4add28.png" alt="image" /&gt;
Reference: &lt;a href="https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful" target="_blank" rel="noreferrer"&gt;https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Subtransactions Causing MultiXact Contention&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reference: &lt;a href="https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/" target="_blank" rel="noreferrer"&gt;https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;FOR UPDATE&lt;/code&gt; itself is a row-level exclusive lock and should not generate a MultiXact ID. But in this scenario, multiple MultiXact waits occurred, causing a cliff-like performance drop:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LWLock:MultiXactMemberControlLock&lt;/li&gt;
&lt;li&gt;LWLock:MultiXactOffsetControlLock&lt;/li&gt;
&lt;li&gt;LWLock:multixact_member&lt;/li&gt;
&lt;li&gt;LwLock:multixact_offset&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It was later discovered that the Django framework was issuing subtransaction statements:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;some&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SAVEPOINT save;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; [the same &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;];&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Replica Performance Cliff&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reference: &lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A single long transaction with a savepoint subtransaction can also cause a performance cliff on replicas.&lt;/p&gt;
&lt;p&gt;If a read occurs on a snapshot taken on the primary, the snapshot includes xmin, xmax, the txip transaction list, and subxip (the list of in-progress subtransactions). &lt;strong&gt;However, neither the original arrays nor the snapshot are directly shared with replicas — replicas read all needed data from WAL.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a4c7e36c274a.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;When subtransactions exist, a single long-running transaction can cause replica performance to drop off a cliff:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/06211a3788ce.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Production Performance Cliff&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When the database is busy and many subtransactions exist, performance can drop sharply, accompanied by subtransaction wait events. This scenario can occur even when per-session subtransactions don&amp;rsquo;t exceed 64, and even on the primary (not just replicas).&lt;/p&gt;
&lt;p&gt;We found that a tool (OGG) defaulted to 50 subtransactions. Reducing the subtransaction count in that tool to 10–20 alleviated the database performance issue.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction usage recommendations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Besides explicit &lt;code&gt;SAVEPOINT&lt;/code&gt;, EXCEPTION blocks, frameworks, and tools can also generate subtransactions.&lt;/li&gt;
&lt;li&gt;If you have replica query workloads, &lt;strong&gt;disable subtransactions&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Use row locks cautiously. &lt;code&gt;FOR UPDATE&lt;/code&gt; + subtransactions can also trigger MultiXactId issues.&lt;/li&gt;
&lt;li&gt;If you must use subtransactions, keep them well below 64 per session — preferably much lower.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Subtransactions have caused countless production issues worldwide, with many case studies and analyses. To quote: &amp;ldquo;Subtransactions are basically cursed. Rip &amp;rsquo;em out.&amp;rdquo;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Subtransaction References
 &lt;div id="subtransaction-references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#subtransaction-references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful" target="_blank" rel="noreferrer"&gt;https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html" target="_blank" rel="noreferrer"&gt;https://fluca1978.github.io/2020/02/05/PLPGSQLExceptions.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/" target="_blank" rel="noreferrer"&gt;https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Books:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;The Internals of PostgreSQL&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL in Action&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Internals: Deep Dive into Transaction Processing&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;PostgreSQL Database Kernel Analysis&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf" target="_blank" rel="noreferrer"&gt;https://edu.postgrespro.com/postgresql_internals-14_parts1-2_en.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Official resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Concurrency_control" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Concurrency_control&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Hint_Bits" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Hint_Bits&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/10/storage-page-layout.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/10/storage-page-layout.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pageinspect.html3" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pageinspect.html3&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Essential PostgreSQL transaction reads (interdb):&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Source code experts:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/102920988" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/102920988&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/127955762" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/127955762&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/125023923" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/125023923&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL snapshot optimization performance comparison:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462" target="_blank" rel="noreferrer"&gt;https://techcommunity.microsoft.com/t5/azure-database-for-postgresql/improving-postgres-connection-scalability-snapshots/ba-p/1806462&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Other resources:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/postgres-atomicity" target="_blank" rel="noreferrer"&gt;https://brandur.org/postgres-atomicity&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/j-8uRuZDRf4mHIQR_ZKIEg&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/postgrechina/article/details/49130743?spm=a2c6h.12873639.article-detail.7.41b32cda2KR1QM" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/postgrechina/article/details/49130743?spm=a2c6h.12873639.article-detail.7.41b32cda2KR1QM&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://mysql.taobao.org/monthly/2018/12/02/" target="_blank" rel="noreferrer"&gt;http://mysql.taobao.org/monthly/2018/12/02/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>Analyzing a 5MB SQL That Consumed 70GB of Memory</title><link>https://lastdba.com/en/2024/08/12/analyzing-a-5mb-sql-that-consumed-70gb-of-memory/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/analyzing-a-5mb-sql-that-consumed-70gb-of-memory/</guid><description>&lt;h3 class="relative group"&gt;Process Memory Analysis
 &lt;div id="process-memory-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-memory-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;WAL writer process (PID 66902) was terminated by signal 6: Aborted&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The log shows postmaster process 66902 was killed.&lt;/p&gt;
&lt;p&gt;Checking OS-level process memory: since &lt;code&gt;top&lt;/code&gt; doesn&amp;rsquo;t show PPID and &lt;code&gt;ps&lt;/code&gt; doesn&amp;rsquo;t show USS, we need both:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 8.7 10.6 &lt;span style="color:#ae81ff"&gt;57488380&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56389972&lt;/span&gt; - R 17:13:03 00:02:47 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211277&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 7.8 9.6 &lt;span style="color:#ae81ff"&gt;52294700&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51127480&lt;/span&gt; - R 17:13:03 00:02:31 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;222749&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 22.7 9.3 &lt;span style="color:#ae81ff"&gt;51320000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49073368&lt;/span&gt; - R 17:35:33 00:02:09 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;39513&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 2.9 6.8 &lt;span style="color:#ae81ff"&gt;38651084&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36354736&lt;/span&gt; ep_poll S 16:13:03 00:02:43 postgres: idle&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using PPID to identify high-memory backend processes. Let&amp;rsquo;s examine process 211276:&lt;/p&gt;</description><content:encoded>
&lt;h3 class="relative group"&gt;Process Memory Analysis
 &lt;div id="process-memory-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#process-memory-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;WAL writer process (PID 66902) was terminated by signal 6: Aborted&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;postmaster&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The log shows postmaster process 66902 was killed.&lt;/p&gt;
&lt;p&gt;Checking OS-level process memory: since &lt;code&gt;top&lt;/code&gt; doesn&amp;rsquo;t show PPID and &lt;code&gt;ps&lt;/code&gt; doesn&amp;rsquo;t show USS, we need both:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 8.7 10.6 &lt;span style="color:#ae81ff"&gt;57488380&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56389972&lt;/span&gt; - R 17:13:03 00:02:47 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;211277&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 7.8 9.6 &lt;span style="color:#ae81ff"&gt;52294700&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;51127480&lt;/span&gt; - R 17:13:03 00:02:31 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;222749&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 22.7 9.3 &lt;span style="color:#ae81ff"&gt;51320000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49073368&lt;/span&gt; - R 17:35:33 00:02:09 postgres: BIND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres &lt;span style="color:#ae81ff"&gt;39513&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;66478&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; 2.9 6.8 &lt;span style="color:#ae81ff"&gt;38651084&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;36354736&lt;/span&gt; ep_poll S 16:13:03 00:02:43 postgres: idle&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using PPID to identify high-memory backend processes. Let&amp;rsquo;s examine process 211276:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ zcat /osw/oswtop/toposw.dat.gz |grep &lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3271756&lt;/span&gt; 1.1g 1.1g S 7.3 0.2 0:03.93 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3291784&lt;/span&gt; 1.3g 1.2g R 96.4 0.2 0:11.87 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7369628&lt;/span&gt; 6.0g 2.1g R 100.0 1.2 0:46.58 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 17.0g 15.9g 2.1g R 100.0 3.2 1:16.70 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 28.8g 27.7g 2.1g R 100.0 5.5 1:46.82 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 41.4g 40.4g 2.1g R 100.0 8.0 2:16.99 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 54.7g 53.7g 2.1g R 88.8 10.7 2:47.60 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 66.5g 64.9g 2.1g R 34.7 12.9 3:22.76 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 71.0g 68.2g 2.1g R 99.1 13.6 3:52.94 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 74.9g 71.2g 2.1g R 100.0 14.2 4:23.05 postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;211276&lt;/span&gt; postgres &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; R 100.0 0.0 4:45.65 postgres&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;We can estimate private memory via &lt;code&gt;RES - SHR = USS&lt;/code&gt;. Process 211276&amp;rsquo;s memory ballooned from ~1GB to ~70GB within minutes, then crashed. All memory growth was private process memory.&lt;/p&gt;

&lt;h3 class="relative group"&gt;SQL Analysis
 &lt;div id="sql-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The PostgreSQL log shows a &lt;strong&gt;5MB SQL&lt;/strong&gt; containing &lt;strong&gt;5,000+ UNION ALLs&lt;/strong&gt; and &lt;strong&gt;30,000+ bind variables&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The execution plan is over 70,000 lines long:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;218196&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;218216&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1318&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1628&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; table1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table1nfo (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((col1)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((colcolcol)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; table1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table1nfo table1nfo_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((col1)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((colcolcol)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;10544&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10543&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; table2 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table2col t_1317 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((ididid)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;xxx&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((idididid)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The plan structure is simple: ~10,000 sub-plans fetching data, then an Append to combine results.&lt;/p&gt;
&lt;p&gt;This SQL monstrosity pushed a single backend process to 70GB. The root cause is clear: reduce the UNION ALLs and the problem goes away (which is indeed what happened). But if we dig deeper, many interesting questions arise:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did a 5MB SQL consume 70GB of memory?&lt;/li&gt;
&lt;li&gt;Is the data itself related to memory usage? Was it caused by returning too many rows?&lt;/li&gt;
&lt;li&gt;Is the memory from parsing cache or plan cache?&lt;/li&gt;
&lt;li&gt;Why didn&amp;rsquo;t &lt;code&gt;work_mem&lt;/code&gt; limit the operation memory, even though it&amp;rsquo;s set to a reasonable value?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;Initial Analysis
 &lt;div id="initial-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#initial-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A 5MB SQL cached in a backend would at minimum contain: metadata, parsed SQL, and plan cache information.&lt;/p&gt;
&lt;p&gt;We&amp;rsquo;ve seen cases before where metadata cache (relcache) for hundreds of thousands of tables/partitions caused huge backend memory. But this database has few tables, so relcache can be preliminarily ruled out (later confirmed by memory dump).&lt;/p&gt;
&lt;p&gt;Parsed SQL data shouldn&amp;rsquo;t be that large — a 5MB SQL parsed shouldn&amp;rsquo;t produce 70GB.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;work_mem limitations and more:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;work_mem&lt;/code&gt; only limits per-operation memory for sort and hash operations. This creates the &amp;ldquo;multiple sort/hash&amp;rdquo; problem: a single SQL with many sorts can use &lt;code&gt;work_mem&lt;/code&gt; × N. PG 13 introduced &lt;code&gt;hash_mem_multiplier&lt;/code&gt; to cap hash usage within one statement. But what about sorts? Currently no multiplier for sorts, though in practice it&amp;rsquo;s less of a problem — statements with dozens of sort nodes are rare, as they carry high cost, and the optimizer tends to place sorts late in the plan.&lt;/p&gt;
&lt;p&gt;Here, &lt;code&gt;work_mem&lt;/code&gt; is 128MB and the instance is PG 13+ with &lt;code&gt;hash_mem_multiplier=1&lt;/code&gt;, so mass hash memory consumption can be ruled out. Furthermore, the execution plan above has &lt;strong&gt;zero sort or hash operations&lt;/strong&gt;, confirming this is not a sort/hash issue.&lt;/p&gt;
&lt;p&gt;So the earlier question: &lt;em&gt;&amp;ldquo;Why didn&amp;rsquo;t work_mem limit operation memory?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Because the SQL only has UNION ALL — no sort or hash operations at all. &lt;code&gt;work_mem&lt;/code&gt; does not constrain memory here.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Other plan nodes:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;No matter what, &lt;code&gt;work_mem&lt;/code&gt; only (!) limits sort/hash. There are dozens of plan node types — are the rest all unconstrained?&lt;/p&gt;

&lt;h3 class="relative group"&gt;Reproduction and Deep Analysis
 &lt;div id="reproduction-and-deep-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproduction-and-deep-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;

&lt;h4 class="relative group"&gt;Empty Table Reproduction
 &lt;div id="empty-table-reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#empty-table-reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create empty table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1(col1 varchar(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Query with many UNION ALLs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;union&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;union&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;all&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...(&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UNION&lt;/span&gt; ALLs, &lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;size&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;KB)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(Too many UNION ALLs may exceed &lt;code&gt;max_stack_depth&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;An empty table + many UNION ALLs immediately reproduces the memory spike. Moreover, after the SQL completes, the backend memory is reclaimed.&lt;/p&gt;
&lt;p&gt;Since this is an empty table (0KB data file), we can rule out data as the cause. So: &lt;em&gt;&amp;ldquo;Is the data itself related to memory? Was it caused by returning too many rows?&amp;rdquo;&lt;/em&gt; — &lt;strong&gt;No, data is not the main factor.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Strace System Call Analysis
 &lt;div id="strace-system-call-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#strace-system-call-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;While executing the SQL, capture system calls with &lt;code&gt;strace -p&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; strace -p &lt;span style="color:#ae81ff"&gt;198337&lt;/span&gt; &amp;gt; strace.198337 2&amp;gt;&amp;amp;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Quick primer on relevant Linux syscalls:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man2/epoll_wait.2.html" target="_blank" rel="noreferrer"&gt;epoll_wait&lt;/a&gt;: Wait for an event. Idle processes sit in this state.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man3/recvfrom.3p.html" target="_blank" rel="noreferrer"&gt;recvfrom&lt;/a&gt;: Receive a message from a socket.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man3/recvfrom.3p.html" target="_blank" rel="noreferrer"&gt;getrusage&lt;/a&gt;: Get resource usage.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man2/brk.2.html" target="_blank" rel="noreferrer"&gt;brk&lt;/a&gt;: Program break. Increasing it allocates memory to the process; decreasing it deallocates. &lt;code&gt;malloc&lt;/code&gt; ultimately calls &lt;code&gt;brk&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man2/lseek.2.html" target="_blank" rel="noreferrer"&gt;lseek&lt;/a&gt;: Reposition file offset.&lt;/li&gt;
&lt;li&gt;&lt;a href="" &gt;write&lt;/a&gt;: Write to a file descriptor. Does not guarantee disk write.&lt;/li&gt;
&lt;li&gt;&lt;a href="https://man7.org/linux/man-pages/man3/sendto.3p.html" target="_blank" rel="noreferrer"&gt;sendto&lt;/a&gt;: Send a message on a socket.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syscalls like &lt;code&gt;lseek&lt;/code&gt;, &lt;code&gt;write&lt;/code&gt;, &lt;code&gt;sendto&lt;/code&gt; include fd (file descriptor) information:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;/proc/[pid]/fd&lt;/code&gt; caches the process&amp;rsquo;s file descriptors. We can map an fd back to a relation — fd 37 is table &lt;code&gt;lzl1&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cd /proc/198337/fd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lrwx------ &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt; Jan &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; 22:59 &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt; -&amp;gt; /pgdata/lzl/data13/base/16385/16386
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ oid2name -d lzldb -f &lt;span style="color:#ae81ff"&gt;16386&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;From database &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filenode Table Name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16386&lt;/span&gt; lzl1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The strace output is dense but structurally simple:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;strace: Process &lt;span style="color:#ae81ff"&gt;198337&lt;/span&gt; attached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait&lt;span style="color:#f92672"&gt;(&lt;/span&gt;4, &lt;span style="color:#f92672"&gt;[{&lt;/span&gt;EPOLLIN, &lt;span style="color:#f92672"&gt;{&lt;/span&gt;u32&lt;span style="color:#f92672"&gt;=&lt;/span&gt;44314568, u64&lt;span style="color:#f92672"&gt;=&lt;/span&gt;44314568&lt;span style="color:#f92672"&gt;}}]&lt;/span&gt;, 1, -1&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34;Q\0\2p\372select col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34; all\nselect col1 from lzl1 union&amp;#34;&lt;/span&gt;..., 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4347&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x34d5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x3cd5000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x3cd5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x3cd5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x88cd6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x894d6000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x894d6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x89cd6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a4d6000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a4d6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a4d6000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a516000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a556000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a556000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;## step5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;write&lt;span style="color:#f92672"&gt;(&lt;/span&gt;2, &lt;span style="color:#e6db74"&gt;&amp;#34;2024-01-26 23:08:01.800 CST [198&amp;#34;&lt;/span&gt;..., 165521&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;165521&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a556000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a57d000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a57d000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a57d000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a59f000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a59f000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d449000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8d46b000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d46b000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d46b000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8d48d000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8d48d000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lseek&lt;span style="color:#f92672"&gt;(&lt;/span&gt;37, 0, SEEK_END&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8dcb1000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8dcb1000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8c179000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8c179000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x8a526000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x8a526000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;0x34d5000&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x34d5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;brk&lt;span style="color:#f92672"&gt;(&lt;/span&gt;NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; 0x34d5000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto&lt;span style="color:#f92672"&gt;(&lt;/span&gt;8, &lt;span style="color:#e6db74"&gt;&amp;#34;\2\0\0\0\230\0\0\0\1@\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0&amp;#34;&lt;/span&gt;..., 152, 0, NULL, 0&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;sendto&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, &lt;span style="color:#e6db74"&gt;&amp;#34;T\0\0\0\35\0\1col1\0\0\0\0\0\0\0\0\0\4\23\377\377\0\0\0\5\0\0C\0&amp;#34;&lt;/span&gt;..., 50, 0, NULL, 0&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#step9&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;recvfrom&lt;span style="color:#f92672"&gt;(&lt;/span&gt;9, 0xddcf60, 8192, 0, NULL, NULL&lt;span style="color:#f92672"&gt;)&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; -1 EAGAIN &lt;span style="color:#f92672"&gt;(&lt;/span&gt;Resource temporarily unavailable&lt;span style="color:#f92672"&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;epoll_wait&lt;span style="color:#f92672"&gt;(&lt;/span&gt;4, strace: Process &lt;span style="color:#ae81ff"&gt;198337&lt;/span&gt; detached
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &amp;lt;detached ...&amp;gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Receive the UNION ALL SQL from fd=9 socket&lt;/li&gt;
&lt;li&gt;&lt;code&gt;brk&lt;/code&gt; allocates memory: process memory grows from 0x34d5000 (54MB) to 0x894d6000 (2.1GB)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;lseek&lt;/code&gt; on table &lt;code&gt;lzl1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Memory grows ~4MB&lt;/li&gt;
&lt;li&gt;&lt;code&gt;write&lt;/code&gt; to fd=2 (log file); memory grows ~48MB&lt;/li&gt;
&lt;li&gt;&lt;code&gt;lseek&lt;/code&gt; on table &lt;code&gt;lzl1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Memory peaks at 0x8dcb1000 (2.1GB), then &lt;code&gt;brk&lt;/code&gt; releases memory back down to 0x34d5000 (54MB) — exactly matching the start&lt;/li&gt;
&lt;li&gt;Send result via socket&lt;/li&gt;
&lt;li&gt;Receive empty message from fd=9&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The strace doesn&amp;rsquo;t reveal much beyond the OS allocating and releasing memory for the process.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Memory Dump Analysis
 &lt;div id="memory-dump-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#memory-dump-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;code&gt;pmap&lt;/code&gt; of the process during the memory spike:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl pg_log&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pmap -x &lt;span style="color:#ae81ff"&gt;76207&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;76207: postgres: postgres lzldb &lt;span style="color:#f92672"&gt;[&lt;/span&gt;local&lt;span style="color:#f92672"&gt;]&lt;/span&gt; SELECT 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Address Kbytes RSS Dirty Mode Mapping
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0000000000400000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7984&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2192&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dcc000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; r---- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000dcd000 &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; rw--- postgres
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000000ddc000 &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000001e49000 &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;224&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;224&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;0000000001e8b000 &lt;span style="color:#ae81ff"&gt;1812380&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; rw--- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ffffffffff600000 &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; r-x-- &lt;span style="color:#f92672"&gt;[&lt;/span&gt; anon &lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------- ------- ------- ------- 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;total kB &lt;span style="color:#ae81ff"&gt;2089384&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1810232&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1807384&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pmap&lt;/code&gt; doesn&amp;rsquo;t label the segments, but we can see the largest segment starts at address 0x1e49000. Checking &lt;code&gt;smaps&lt;/code&gt; for more detail:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl 76207&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ cat smaps |grep 1e49000 -A &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;01e49000-01e8b000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;264&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;01e8b000-70872000 rw-p &lt;span style="color:#ae81ff"&gt;00000000&lt;/span&gt; 00:00 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;heap&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Size: &lt;span style="color:#ae81ff"&gt;1812380&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Rss: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Pss: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Shared_Dirty: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Clean: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Private_Dirty: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Referenced: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Anonymous: &lt;span style="color:#ae81ff"&gt;1804400&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;AnonHugePages: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Swap: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;KernelPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;MMUPageSize: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; kB&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Heap segment. PSS (private memory): 1.8GB!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;(I tried using gdb to dump the 0x1e8b000-0x70872000 segment but it failed — not sure why. Suggestions welcome!)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;gcore&lt;/code&gt; for a rough dump:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ gcore -o /pgdata/lzl/gcore.dump &lt;span style="color:#ae81ff"&gt;76207&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ strings gcore.dump.76207&amp;gt; text.dump.76207
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ ll -h
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r----- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres 2.0G Jan &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; 17:29 gcore.dump.76207
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r----- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres 5.2M Jan &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; 17:30 text.dump.76207&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2GB virtual memory allocated, 1.8GB physical memory occupied — but only 5.2MB of actual data stored!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A rough &lt;code&gt;hexdump&lt;/code&gt; reveals many memory holes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@lzl lzl&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ hexdump -C gcore.dump.76207 |head -10000 |grep &lt;span style="color:#e6db74"&gt;&amp;#34;00 00 00 00 00 00 00 00&amp;#34;&lt;/span&gt;|wc -l
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3690&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;log_planner_stats and Other Info
 &lt;div id="log_planner_stats-and-other-info" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#log_planner_stats-and-other-info" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;To verify whether the plan cache is the culprit, enable logging for parse, planner, and executor phases:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_parser_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_planner_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; log_executor_stats &lt;span style="color:#f92672"&gt;=&lt;/span&gt; on&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The logs show the parse phase uses little memory, while the planner consumes significantly more.&lt;/p&gt;
&lt;p&gt;Planner stats log:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-01-26 18:01:41.592 CST &lt;span style="color:#f92672"&gt;[&lt;/span&gt;208503&lt;span style="color:#f92672"&gt;]&lt;/span&gt; LOG: PLANNER STATISTICS
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-01-26 18:01:41.592 CST &lt;span style="color:#f92672"&gt;[&lt;/span&gt;208503&lt;span style="color:#f92672"&gt;]&lt;/span&gt; DETAIL: ! system usage stats:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0.048955 s user, 0.004996 s system, 0.054077 s elapsed
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! &lt;span style="color:#f92672"&gt;[&lt;/span&gt;11.208034 s user, 1.313838 s system total&lt;span style="color:#f92672"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! &lt;span style="color:#ae81ff"&gt;2255352&lt;/span&gt; kB max resident size
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/352&lt;span style="color:#f92672"&gt;]&lt;/span&gt; filesystem blocks in/out
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0/1315 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/563859&lt;span style="color:#f92672"&gt;]&lt;/span&gt; page faults/reclaims, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; swaps
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; signals rcvd, 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;0/0&lt;span style="color:#f92672"&gt;]&lt;/span&gt; messages rcvd/sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ! 0/0 &lt;span style="color:#f92672"&gt;[&lt;/span&gt;1/16&lt;span style="color:#f92672"&gt;]&lt;/span&gt; voluntary/involuntary context switches&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;2GB max resident size — consistent with the RES observed from the OS. This answers: &lt;em&gt;&amp;ldquo;Is the memory from parsing cache or plan cache?&amp;rdquo;&lt;/em&gt; — &lt;strong&gt;The planner phase consumes the memory.&lt;/strong&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Inspecting TopMemoryContext
 &lt;div id="inspecting-topmemorycontext" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inspecting-topmemorycontext" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL manages backend private memory through MemoryContext. We can dump &lt;code&gt;TopMemoryContext&lt;/code&gt; via gdb:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;TopMemoryContext: &lt;span style="color:#ae81ff"&gt;101488&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;48464&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;53024&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pgstat TabStatusArray lookup hash table: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;1408&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;6784&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TopTransactionContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7720&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;472&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TableSpace cache: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;2048&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;6144&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RowDescriptionContext: &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;6880&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1312&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; MessageContext: &lt;span style="color:#ae81ff"&gt;1854981336&lt;/span&gt; total in &lt;span style="color:#ae81ff"&gt;235&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;7911304&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1847070032&lt;/span&gt; used
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Grand total: &lt;span style="color:#ae81ff"&gt;1856104056&lt;/span&gt; bytes in &lt;span style="color:#ae81ff"&gt;431&lt;/span&gt; blocks; &lt;span style="color:#ae81ff"&gt;8226712&lt;/span&gt; free &lt;span style="color:#f92672"&gt;(&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; chunks&lt;span style="color:#f92672"&gt;)&lt;/span&gt;; &lt;span style="color:#ae81ff"&gt;1847877344&lt;/span&gt; used&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;MessageContext&lt;/strong&gt; accounts for 1.8GB — the largest consumer.&lt;/p&gt;
&lt;p&gt;From &lt;code&gt;src/backend/utils/mmgr/README&lt;/code&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;MessageContext &amp;mdash; this context holds the current command message from the frontend, as well as any derived storage that need only live as long as the current message (for example, in simple-Query mode the parse and plan trees can live here). This context will be reset, and any children deleted, at the top of each cycle of the outer loop of PostgresMain. This is kept separate from per-transaction and per-portal contexts because a query string might need to live either a longer or shorter time than any single transaction or portal.&lt;/p&gt;
&lt;/blockquote&gt;&lt;blockquote&gt;&lt;p&gt;When creating a prepared statement, the parse and plan trees will be built in a temporary context that&amp;rsquo;s a child of MessageContext.&lt;/p&gt;
&lt;/blockquote&gt;&lt;ul&gt;
&lt;li&gt;&lt;code&gt;MessageContext&lt;/code&gt; caches messages from the frontend, including derived parse and plan tree data.&lt;/li&gt;
&lt;li&gt;Parse and plan trees are &lt;strong&gt;children&lt;/strong&gt; of &lt;code&gt;MessageContext&lt;/code&gt; — when &lt;code&gt;MessageContext&lt;/code&gt; is reclaimed, parse and plan trees are reclaimed too.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This also explains the private memory reclamation: the plan tree data produced during the planner phase is a child of &lt;code&gt;MessageContext&lt;/code&gt;. Once results are returned, &lt;code&gt;MessageContext&lt;/code&gt; is reset and all children are freed. This matches the strace observation where memory after release matches memory before allocation exactly.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Answering the final question: &lt;em&gt;&amp;ldquo;Why did a 5MB SQL consume 70GB of memory?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The overwhelming majority of memory was consumed during plan creation.&lt;/strong&gt; The planner allocated enormous amounts of memory. &lt;code&gt;work_mem&lt;/code&gt; and &lt;code&gt;hash_mem_multiplier&lt;/code&gt; can only constrain sort and hash operations — they cannot limit other memory operations during planning. The plan tree itself isn&amp;rsquo;t that large, but the allocation process creates massive &lt;strong&gt;memory holes&lt;/strong&gt;: megabyte-scale data (metadata, parse tree, plan tree, etc.) ends up stored in gigabyte-scale memory regions.&lt;/p&gt;
&lt;p&gt;These SQL, parse tree, and plan tree structures are all cached in &lt;code&gt;MessageContext&lt;/code&gt; and its children. Once the result is sent back to the client, all memory from this phase is reclaimed.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — 2001: A Space Odyssey</title><link>https://lastdba.com/en/2024/08/12/book-notes-2001-a-space-odyssey/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-2001-a-space-odyssey/</guid><description>&lt;p&gt;​
​​&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; Arthur C. Clarke's masterpiece — a work no sci-fi fan can afford to skip. I'd long heard of its reputation, but having already seen the film adaptation, I felt it lacked some novelty, so the book just sat on my shelf unread. But after reading it, I can say with complete confidence: every page is filled with freshness — the kind of dopamine-driven reading that makes it impossible to put down.
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 class="relative group"&gt;God-Tier Predictions
 &lt;div id="god-tier-predictions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#god-tier-predictions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This book was published in the 1960s — more than 60 years ago from now (2023). What is science fiction? Sci-fi makes reasonably plausible predictions about the future based on current science. And the author, living in the 1960s, imagined humanity&amp;rsquo;s space exploration in the year 2000. We, living in the present, are perfectly positioned to verify his &amp;ldquo;future world.&amp;rdquo;&lt;/p&gt;</description><content:encoded>&lt;p&gt;​
​​&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; Arthur C. Clarke's masterpiece — a work no sci-fi fan can afford to skip. I'd long heard of its reputation, but having already seen the film adaptation, I felt it lacked some novelty, so the book just sat on my shelf unread. But after reading it, I can say with complete confidence: every page is filled with freshness — the kind of dopamine-driven reading that makes it impossible to put down.
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 class="relative group"&gt;God-Tier Predictions
 &lt;div id="god-tier-predictions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#god-tier-predictions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This book was published in the 1960s — more than 60 years ago from now (2023). What is science fiction? Sci-fi makes reasonably plausible predictions about the future based on current science. And the author, living in the 1960s, imagined humanity&amp;rsquo;s space exploration in the year 2000. We, living in the present, are perfectly positioned to verify his &amp;ldquo;future world.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Of course, these prophetic predictions aren&amp;rsquo;t perfectly accurate. For example, his forecast for manned space travel is clearly a bit too optimistic. After the Apollo program ended, we&amp;rsquo;ve never again undertaken a practice that breaks free from Earth&amp;rsquo;s bounds — not even returning to the Moon&amp;hellip;&lt;/p&gt;
&lt;p&gt;In the novel&amp;rsquo;s year 2000, humanity already has a luxurious Moon base and dispatches astronauts aboard a spacecraft bound for Jupiter.&lt;/p&gt;
&lt;p&gt;But you can&amp;rsquo;t really blame the author. The book was published in 1968, and the very next year, humans landed on the Moon. Given a few more decades, landing on Jupiter should&amp;rsquo;ve been feasible, right?&lt;/p&gt;
&lt;p&gt;The novel contains many astonishing predictions — here are a few that left a deep impression:&lt;/p&gt;
&lt;p&gt;Population: Arthur C. Clarke predicted with stunning accuracy that the global population would explode to 6 billion by 2000 (in the 1960s it was 3 billion). He even foresaw certain countries implementing birth control due to overpopulation, limiting families to two children. (Clearly conservative, right&amp;hellip; The Celestial Empire had already started family planning, and only one child was allowed — until young people stopped wanting children altogether.)&lt;/p&gt;
&lt;p&gt;Pandemic control: In the year 2000, a global pandemic spreads, with quarantine zones set up everywhere&amp;hellip; (I have no f***ing words.)&lt;/p&gt;
&lt;p&gt;Artificial intelligence: In 1946, von Neumann invented the computer — the concept was just emerging — yet Arthur C. Clarke was already emphasizing the concept of artificial intelligence, predicting AI&amp;rsquo;s control over vast, complex systems. Even more remarkably, he had already imagined AI potentially rebelling against humans&amp;hellip; ChatGPT was only recognized this year. The more you think about it, the more chilling it gets~&lt;/p&gt;
&lt;p&gt;Tablet computers: Home computers didn&amp;rsquo;t appear until the 1980s, yet in the novel, people are already using tablet computers to control system inputs and read the news&amp;hellip; Because the novel is so hardcore, Clarke even describes switching between a news homepage and category pages on a tablet, with data analysis delivering content tailored to the user&amp;hellip;&lt;/p&gt;
&lt;p&gt;Triple-site mirroring: As a DBA, I&amp;rsquo;m hyper-sensitive to this term. The author describes data center mirror backups, with data split into three identical copies stored in different locations on Earth for disaster recovery&amp;hellip; I&amp;rsquo;m not entirely sure when concepts like &amp;ldquo;two-site-three-center&amp;rdquo; or &amp;ldquo;three-site-five-center&amp;rdquo; were first proposed (though I imagine not long ago), but seeing the novel describe data mirroring and remote disaster recovery in such detail genuinely struck a chord with my DBA instincts.&lt;/p&gt;
&lt;p&gt;Reading this masterpiece, my state of mind was: shock, then more shock, then nonstop shock~ How did Arthur C. Clarke, in the 1960s, conceive of this future world? Unimaginable. No wonder some people say: &amp;ldquo;Arthur C. Clarke time-traveled to the present, then went back to the 1960s to write this work.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Imagination
 &lt;div id="imagination" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#imagination" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt; If it were merely scientific prediction, it couldn't truly be called science fiction. Sci-fi can't just be cold scientific extrapolation — it needs a touch of humanistic distillation, a bit of imagination that departs from science, like Liu Cixin's portrayals of human nature. This element of imagination beyond science is precisely what ultimately determines a sci-fi work's stature.

 And the ultimate imaginative conceit of *2001: A Space Odyssey* is the TMA-1 monolith and the Star Child. The TMA-1 monolith is an alien artifact that catalyzes human evolution, and it simultaneously represents the vast gap between human science and alien science. The entire novel revolves around this monolith — it is the very core of the entire sci-fi story. In fact, the monolith only appears at two points in time: the ape-men era and the beginning of humanity's space exploration. When the ape-men first encounter the monolith, their physical structure undergoes subtle changes — their hands become more dexterous, their brains begin to think. The author then uses several chapters to describe the ape-men's transformation:

 1. This group of ape-men masters tools. In a confrontation with a leopard, for the first time in history, they gain the upper hand — marking the first time they stand at the top of the food chain, no longer prey.

 2. This group of ape-men decisively triumphs in a struggle against another group of apes — marking their transformation from ape-men into humans.

 Then, the novel leaps over millions of years of human history, cutting directly to the era of space travel. This technique is utterly brilliant~

 The second time: a lone human, after countless hardships, reaches the monolith on Saturn (Jupiter in the film). The protagonist passes through a wormhole pre-arranged by the alien beings, experiences a journey through space, witnesses many wondrous cosmic spectacles, and finally falls into a room — the Star Child is born!

 The alien life guided ape-men to become humans, then guided humans to become the Star Child. The Star Child is pure imagination — built on the analogy of ape-men becoming humans, marked by the TMA-1 monolith. Imaginative elements are added perfectly and naturally, leaving a profound, lingering aftertaste. Worthy of being a seminal work in science fiction.
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 class="relative group"&gt;Old Liu (Liu Cixin)
 &lt;div id="old-liu-liu-cixin" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#old-liu-liu-cixin" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt; I read quite a few of Liu Cixin's works during university — *The Three-Body Problem*, *The Wandering Earth*, *Ball Lightning*, *Earth Cannon*... I really like *The Three-Body Problem*, but I have no interest in the excessive factional disputes in the first book — I even found them a bit contrived. However, the concept of understanding Trisolaran society through the Three-Body game is brilliantly executed. *The Dark Forest* is clearly much better — arguably the most thrilling book in the trilogy. Back when I finished these works, I had a feeling *The Wandering Earth* might be adapted into a film; the others seemed harder to film...

 Liu Cixin's sci-fi works feature strong narrative suspense and abundant human conflict, focusing more on human behavior against a cosmic backdrop. Arthur C. Clarke's works, by contrast, rarely dwell on interpersonal relationships. He prefers depicting the face of future society and the bizarre wonders of stars, planets, and space travel.

 Many parts of Liu Cixin's work clearly show the influence of *Space Odyssey*. When Clarke describes TMA-1, he uses the word &amp;quot;smooth&amp;quot; — clearly the &amp;quot;droplet&amp;quot; in *The Three-Body Problem* references this concept. Both are technological products of alien civilizations beyond human comprehension, though their purposes are vastly different~

 Speaking of which, Liu Cixin hasn't released a new work in over a decade — what's he up to...
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 class="relative group"&gt;The Film — 2001: A Space Odyssey
 &lt;div id="the-film--2001-a-space-odyssey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-film--2001-a-space-odyssey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt; Released in 1968, another masterpiece by Kubrick — the god of sci-fi meets the god of cinema.

 That iconic BGM swells~ When the ape-man throws the bone — the tool — into the sky, and as it falls, the shot cuts to millions of years later... An exquisitely brilliant piece of cinematic language, truly stirring~

 When I first watched this film, there were many parts I didn't fully understand. After reading the novel, everything falls into place. The film also adds many classic scenes, such as:

 1. The depiction of Earth's orbital space in the year 2000. After over 30 years of development, humanity has launched countless capsules into space — the sky is filled with all manner of spacecraft. This sequence was frequently referenced before the year 2000.

 2. HAL 9000 reading the astronauts' lips and learning they plan to shut him down. I assumed this scene was in the novel, but the book's portrayal of taking down the AI is far more circuitous. Both are brilliant, though. (The film *The Wandering Earth*'s MOSS pays heavy homage to HAL 9000.)
&lt;/code&gt;&lt;/pre&gt;

&lt;h2 class="relative group"&gt;Closing
 &lt;div id="closing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#closing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;pre&gt;&lt;code&gt; *Space Odyssey* perfectly embodies what hard sci-fi should be: god-tier predictions about the future, paired with a finishing touch of pure imagination. I read this book far too late — I absolutely must read the sequels soon!
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;​&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Are We Smart Enough to Know How Smart Animals Are?</title><link>https://lastdba.com/en/2024/08/12/book-notes-are-we-smart-enough-to-know-how-smart-animals-are/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-are-we-smart-enough-to-know-how-smart-animals-are/</guid><description>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/921a326cbe1c.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;My previous book was &lt;em&gt;Wild&lt;/em&gt; — the Pacific Crest Trail queen mentioned this book, &lt;em&gt;Are We Smart Enough to Know How Smart Animals Are?&lt;/em&gt;, noting how she&amp;rsquo;d read it page by page, tearing each one out after reading. I wonder: as she journeyed through mountains and forests, hearing birdsong and streams, did reading this book about how clever animals are feel especially resonant?&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d previously read &lt;em&gt;Sapiens&lt;/em&gt; (I can&amp;rsquo;t help recommending this book — it&amp;rsquo;s incredible). That book starts from when humans first stood upright and traces our journey until we gradually became gods&amp;hellip; What exactly makes humans different — what allows us to stand out from the myriad of living creatures?&lt;/p&gt;</description><content:encoded>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/921a326cbe1c.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;My previous book was &lt;em&gt;Wild&lt;/em&gt; — the Pacific Crest Trail queen mentioned this book, &lt;em&gt;Are We Smart Enough to Know How Smart Animals Are?&lt;/em&gt;, noting how she&amp;rsquo;d read it page by page, tearing each one out after reading. I wonder: as she journeyed through mountains and forests, hearing birdsong and streams, did reading this book about how clever animals are feel especially resonant?&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d previously read &lt;em&gt;Sapiens&lt;/em&gt; (I can&amp;rsquo;t help recommending this book — it&amp;rsquo;s incredible). That book starts from when humans first stood upright and traces our journey until we gradually became gods&amp;hellip; What exactly makes humans different — what allows us to stand out from the myriad of living creatures?&lt;/p&gt;
&lt;p&gt;The author, Frans de Waal, is an expert in primate behavior — the most cutting-edge and popular field within all animal behavior studies. Especially as experimental methods have improved, we&amp;rsquo;ve discovered that those traits humans keep proudly claiming as uniquely ours have all been found in other animal groups.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Apes
 &lt;div id="apes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#apes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This book is highly scientific, containing extensive descriptions of experiments, observations, and the development of biological science. Since it&amp;rsquo;s science, let&amp;rsquo;s learn something~ When you see the word &amp;ldquo;ape,&amp;rdquo; what kind of ape image comes to mind? Whatever it is, it&amp;rsquo;s not precise enough. Because &amp;ldquo;ape&amp;rdquo; is a general term — you can roughly divide apes into four types: chimpanzees, gorillas, orangutans, and gibbons (bonobos are likely a branch of chimpanzees, frequently mentioned in the book; I&amp;rsquo;ll set them aside for simplicity):&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Homo Sapiens&lt;/th&gt;
 &lt;th&gt;Chimpanzee&lt;/th&gt;
 &lt;th&gt;Gorilla&lt;/th&gt;
 &lt;th&gt;Orangutan&lt;/th&gt;
 &lt;th&gt;Gibbon&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;Homo Sapien&lt;/td&gt;
 &lt;td&gt;chimpanzee&lt;/td&gt;
 &lt;td&gt;gorilla&lt;/td&gt;
 &lt;td&gt;orangutan&lt;/td&gt;
 &lt;td&gt;hylobates&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;img src="https://lastdba.com/img/csdn/f164979861f5.png" alt="点击查看图片来源" style="zoom:180%;" /&gt; /&amp;gt;&lt;/td&gt;
 &lt;td&gt;&lt;img src="https://lastdba.com/img/csdn/d7747bc5f86e.png" alt="点击查看图片来源" style="zoom:80%;" /&gt; /&amp;gt;&lt;/td&gt;
 &lt;td&gt;&lt;img src="https://lastdba.com/img/csdn/6e6713e61e0e.png" alt="点击查看图片来源" style="zoom:160%;" /&gt; /&amp;gt;&lt;/td&gt;
 &lt;td&gt;


&lt;img src="https://lastdba.com/img/csdn/0e9a9c47a115.png" alt="img" /&gt;&lt;/td&gt;
 &lt;td&gt;&lt;img src="https://lastdba.com/img/csdn/7837fde1fba6.png" alt="点击查看图片来源" style="zoom:180%;" /&gt; /&amp;gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Kinship:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4ea31e990816.png" alt="https://upload.wikimedia.org/wikipedia/commons/thumb/a/a6/Hominoid_taxonomy_6.svg/800px-Hominoid_taxonomy_6.svg.png" /&gt;
Hominoidea means the family of &amp;ldquo;hominoids&amp;rdquo; — and yes, all these close relatives of ours belong to the hominid family! The other Homixxx entries are smaller tribal branches. From the family tree above, we can see that we Homo sapiens are most closely related to chimpanzees, with gorillas, orangutans, and gibbons increasingly distant.&lt;/p&gt;
&lt;p&gt;Evolutionary timeline:



&lt;img src="https://lastdba.com/img/csdn/966725f9850f.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;
&lt;p&gt;About six million years ago, we and chimpanzees were still the same species&amp;hellip; Chimpanzees are also universally recognized as the most intelligent animals. Did we really evolve from monkeys? This description isn&amp;rsquo;t quite accurate. Although the diagram above doesn&amp;rsquo;t mark monkeys, going further back we certainly share a common ancestor. But that doesn&amp;rsquo;t mean we evolved from monkeys — just like chimpanzees, we share a common ancestor that is now extinct. So we didn&amp;rsquo;t evolve from monkeys, but we and monkeys share a common ancestor — just two different branches. &amp;ldquo;Although for convenience we often use &amp;lsquo;animals&amp;rsquo; to refer to non-human species, it&amp;rsquo;s undeniable that humans are a kind of animal.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;What Makes Us Different?
 &lt;div id="what-makes-us-different" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-makes-us-different" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Tool use?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;After reading &lt;em&gt;Space Odyssey&lt;/em&gt;, I thought what made humans human was our learning to use tools. From the moment we grasped tools in our hands to crack open bone marrow, to humanity venturing into space to explore the unknown — all because we learned to use tools. But we can easily find similar behaviors in other animals. Chimpanzees use twigs to eat ants, and use branches as ladders to climb over walls. Even their thumbs, like ours, can grasp objects. Tool use is actually quite common in the animal kingdom. It seems tool use is not a uniquely human trait — those animals that also possess this skill haven&amp;rsquo;t developed higher civilizations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Cognitive Revolution?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;After reading &lt;em&gt;Sapiens&lt;/em&gt;, there was one particularly novel idea. I long and firmly believed it was correct: the Cognitive Revolution. The author argues that the Cognitive Revolution was the crucial juncture where Homo sapiens diverged dramatically from other animals. The Cognitive Revolution occurred before the Agricultural Revolution, when sapiens were still just hunters. The author gives a classic example: one person discovers a lion by the river, and returns to tell the rest of the tribe — &amp;ldquo;There&amp;rsquo;s a lion by the river.&amp;rdquo; At that moment, even though no one else has seen it with their own eyes, they all believe in their minds the concept of &amp;ldquo;there&amp;rsquo;s a lion by the river.&amp;rdquo; This transmission of belief later gave rise to religion, power, nations, currency, corporations, and other virtual concepts. &lt;em&gt;Are We Smart Enough to Know How Smart Animals Are?&lt;/em&gt; offers a counterexample: a monkey, being bullied by two others, cornered with no escape, lets out a &amp;ldquo;snake!&amp;rdquo; cry (the call they only make when they encounter snakes). The two other monkeys stop to check whether there really is a snake — only when they confirm there isn&amp;rsquo;t one do they resume the chase. Many observations show that numerous animals possess the ability to believe through others&amp;rsquo; stories.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Upright walking?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Upright walking freed our hands, and our brains grew increasingly developed. This is described in &lt;em&gt;Sapiens&lt;/em&gt;. In fact, bipedal walking isn&amp;rsquo;t as special as we imagine. Bonobos on the savannah can walk on two legs for extended periods.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Language?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Language was once thought to belong to humans alone. Just because we can&amp;rsquo;t understand what animals are saying doesn&amp;rsquo;t mean they lack simple language. Animals&amp;rsquo; various calls are not innate. When a chimpanzee grows up with one group, their calls in different situations are similar. If you place that chimpanzee in a different, unrelated chimpanzee group, researchers found their calls are completely different — and for a long time, that chimpanzee cannot integrate into the new group until it learns the new calling patterns. Some once believed language influences how we think. But to think, language is not a necessity. The ability of animals to add different numbers was once thought to depend on language, yet in an experiment, a chimpanzee successfully added numbers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cooperation?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;The Wandering Earth 2&lt;/em&gt; has this scene: a minister shows a fossilized human bone that was broken and healed — proof that this human suffered a severe injury. Among other animals, the injured would be abandoned, but this person received help from others and survived. Is cooperation the dividing line between humans and animals? Chimpanzee groups help elderly chimpanzees with limited mobility — bringing them food, feeding them water mouth-to-mouth.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Complex social relationships?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Chimpanzees not only know their own relationship with other chimpanzees, but also understand the relationships between B and C. Even when encountering an unfamiliar chimpanzee, they can assess its social status through how other chimpanzees treat it, and behave accordingly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Thinking about the future?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Absolutely no problem at all&amp;hellip;&lt;/p&gt;
&lt;p&gt;Plato proposed that humans are the only featherless bipeds. Diogenes then plucked a chicken and said: &amp;ldquo;Behold — Plato&amp;rsquo;s &amp;lsquo;man.&amp;rsquo;&amp;rdquo; We can keep adding qualifiers to this definition until we can no longer find a description that fits only humans and no other animal. Humans and animals are certainly different — of course we can find the most fitting description of humans from many perspectives. But isn&amp;rsquo;t that a bit too subjective?&lt;/p&gt;
&lt;p&gt;Although this book refutes various claims of difference, the author does not deny that humans are special. In some respects, we are clearly unique. But we have yet to find that distinguishing point — at least, no consensus has been reached. If we want to find the essential difference between humans and animals, we must first discard the presupposition that &amp;ldquo;humans are special.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Closing
 &lt;div id="closing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#closing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;First, a complaint about the Chinese translation — it screams machine translation. For example: &amp;ldquo;人们认为动物善于学习行为的普遍后果，但无法记住任何特定的联系&amp;rdquo; (&amp;ldquo;People believe animals are good at learning the general consequences of behavior, but cannot remember any specific connections&amp;rdquo;). It&amp;rsquo;s very hard to understand this sentence using direct Chinese thinking — it reads exactly like a machine-translated sentence. But if you think in English, it&amp;rsquo;s instantly clear: the sentence means &amp;ldquo;People believe animals are good at learning the consequences of behaviors but do not know the connection between the behavior and the consequence&amp;rdquo; (the author is refuting this statement).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Are We Smart Enough to Know How Smart Animals Are?&lt;/em&gt; has a strong academic atmosphere. It uses a wealth of reliable experiments and observations to explain the essence of animal behavior. Reading this book feels a bit like reading a paper — logically rigorous and cautiously worded in its claims. Primate studies, as the frontier of animal behavior research, hold great significance for studying human behavior — though some other animals&amp;rsquo; behaviors are also useful.&lt;/p&gt;
&lt;p&gt;The book contains many ideas that spark sudden flashes of insight: Clever Hans, the impossibility of equal testing environments for human infants and apes, the homology of all vertebrate brains, chimpanzees&amp;rsquo; astonishing memory and logical reasoning abilities, chimpanzee power struggles, and more. Frans de Waal&amp;rsquo;s other book &lt;em&gt;Chimpanzee Politics&lt;/em&gt; has already been added to my reading list&amp;hellip;&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Chimpanzee Politics: Power and Sex among Apes</title><link>https://lastdba.com/en/2024/08/12/book-notes-chimpanzee-politics-power-and-sex-among-apes/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-chimpanzee-politics-power-and-sex-among-apes/</guid><description>&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Frans de Waal&amp;rsquo;s seminal work &lt;em&gt;Chimpanzee Politics&lt;/em&gt; was published in 1982 — his first book and also recommended reading for incoming members of the U.S. Congress. Another work of his I read previously, &lt;em&gt;Are We Smart Enough to Know How Smart Animals Are?&lt;/em&gt;, was from 2016 — such a vast timespan between them. &lt;em&gt;Are We Smart Enough&lt;/em&gt; introduced many animal behaviors, including those of humanity&amp;rsquo;s numerous close relatives, while &lt;em&gt;Chimpanzee Politics&lt;/em&gt; focuses solely on our very closest relative — the chimpanzee. It observes a chimpanzee colony in a zoo and analyzes the structure, evolution, and behaviors of chimpanzee social power and politics.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Frans de Waal&amp;rsquo;s seminal work &lt;em&gt;Chimpanzee Politics&lt;/em&gt; was published in 1982 — his first book and also recommended reading for incoming members of the U.S. Congress. Another work of his I read previously, &lt;em&gt;Are We Smart Enough to Know How Smart Animals Are?&lt;/em&gt;, was from 2016 — such a vast timespan between them. &lt;em&gt;Are We Smart Enough&lt;/em&gt; introduced many animal behaviors, including those of humanity&amp;rsquo;s numerous close relatives, while &lt;em&gt;Chimpanzee Politics&lt;/em&gt; focuses solely on our very closest relative — the chimpanzee. It observes a chimpanzee colony in a zoo and analyzes the structure, evolution, and behaviors of chimpanzee social power and politics.&lt;/p&gt;
&lt;p&gt;If you see chimpanzees at the zoo mating brazenly in broad daylight without any inhibitions, or screaming and attacking one another — seemingly devoid of moral restraint, showing no trace of civilization — then the English title of &lt;em&gt;Are We Smart Enough&lt;/em&gt; serves as a perfect retort: &lt;em&gt;&amp;ldquo;Are We Smart Enough to Know How Smart Animals Are?&amp;rdquo;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/da1153babfbb.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Power and Alliances
 &lt;div id="power-and-alliances" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#power-and-alliances" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s commonly assumed that in animal social structures, the strongest male becomes the leader. This does broadly align with chimpanzee social structure. But it&amp;rsquo;s far from that simple — physical strength is not the sole factor determining dominance relationships. Alliances are the crucial factor, perhaps &lt;em&gt;the&lt;/em&gt; most important factor. The book spends extensive passages discussing &amp;ldquo;triangular relationships.&amp;rdquo; Here, I need to introduce the book&amp;rsquo;s three main chimpanzee protagonists:&lt;/p&gt;
&lt;p&gt;Yeroen (the elder) — Luit (the middle) — Nikkie (the young)&lt;/p&gt;
&lt;p&gt;These three male chimpanzees form a power center — the power core of this chimpanzee colony — and their political struggles play out on this political stage. All three have, at different times, been the colony&amp;rsquo;s alpha. Initially, the capable and broadly respected Yeroen was alpha. Then Luit took over. Finally, Nikkie established a puppet-style rule. They built a hierarchical organization and competed within it for dominance over the rest of the group.&lt;/p&gt;
&lt;p&gt;First: a male with superior fighting ability cannot simply usurp the group&amp;rsquo;s leadership. Power collapses not when a challenger defeats the current ruler in combat, but when the ruler can no longer protect other members of the society. During Luit&amp;rsquo;s bid for power, Luit and his ally Nikkie constantly attacked other group members, and when the Luit-Nikkie alliance was present together, Yeroen could not offer protection to others.&lt;/p&gt;
&lt;p&gt;The Luit-Nikkie alliance played a decisive role in toppling the Yeroen dynasty. But Yeroen&amp;rsquo;s fall from power also created new alliance opportunities — just like human politicians, chimpanzees seize such opportunities too. Yeroen found the key player in the current &amp;ldquo;triangular relationship&amp;rdquo;: Nikkie.&lt;/p&gt;
&lt;p&gt;Before Yeroen&amp;rsquo;s fall, Nikkie was Luit&amp;rsquo;s ally. Afterward, Nikkie became Yeroen&amp;rsquo;s ally. Why would the seasoned Yeroen support Nikkie after losing power?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For Nikkie: he went from number two to number one. He was the &amp;ldquo;person&amp;rdquo; most eager for Yeroen&amp;rsquo;s support.&lt;/li&gt;
&lt;li&gt;For Yeroen: an alliance with Nikkie secured his position as number two in the group, and Nikkie — relative to Yeroen — needed &lt;em&gt;his&lt;/em&gt; support more. Nikkie couldn&amp;rsquo;t openly oppose Yeroen, because if he did, Nikkie&amp;rsquo;s own position would become unstable. Yeroen gained more freedom of action and traded it for more mating opportunities with females.&lt;/li&gt;
&lt;li&gt;As for Luit: he dropped from the top of the power rankings to number three.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Yeroen-Nikkie alliance, though tight, featured a very cunning Yeroen. Although Yeroen&amp;rsquo;s relationship with Luit was terrible, Yeroen would still proactively approach Luit — and Nikkie would invariably intervene, without exception. Why did Yeroen seek contact with Luit? Yeroen approached Luit precisely to put on a show for Nikkie. For Nikkie, Yeroen&amp;rsquo;s behavior served as a constant reminder that Nikkie&amp;rsquo;s position depended entirely on Yeroen&amp;rsquo;s choices. The young Nikkie lacked strong grassroots support from the group. The seasoned, cunning Yeroen held Nikkie in the palm of his hand — Nikkie&amp;rsquo;s ruling foundation did not rest under his own feet.&lt;/p&gt;
&lt;p&gt;When one chimpanzee grooms another&amp;rsquo;s fur, this is not merely a simple biological act — it&amp;rsquo;s a reflection of the two chimpanzees&amp;rsquo; social relationship, signifying that their bond is sufficiently strong, or that one seeks a favor from the other. A classic scenario in the triangular relationship: Nikkie (center) grooms his ally Yeroen (left), while Luit (right) sits alone at a short distance.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4428a0cfefba.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Males and Females in Power
 &lt;div id="males-and-females-in-power" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#males-and-females-in-power" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Although males are generally stronger than females, male chimpanzees do not use their full strength when attacking females. Males only bite and tear at each other when facing another male.&lt;/p&gt;
&lt;p&gt;Social mammal groups are typically composed of many females and a few males. Females also play an important role in power struggles.&lt;/p&gt;
&lt;p&gt;Female chimpanzees tend to avoid competition because they need a safer, more stable environment to raise offspring. Power transitions in the group do not happen instantaneously — when Luit replaced Yeroen, the process took over two months. During those two months, the two chimpanzees repeatedly fought and reconciled. Female chimpanzees played a vital mediating role in this process. Females would proactively embrace both of them, breaking the tension during confrontations and working hard to push them toward reconciliation.&lt;/p&gt;
&lt;p&gt;Male leadership arises from strength, alliances, and support levels. Females also have a leader, but female leadership is determined by character and age. Females almost never need to fight each other; the probability of conflict between females is extremely low, and their hierarchical order can persist for many years.&lt;/p&gt;
&lt;p&gt;Social psychologists, through alliance-game testing, have found that males take more proactive action, while females place more emphasis on the atmosphere of the game. In competitive activities, men are all about achieving strategic objectives — they prefer to seize the &amp;ldquo;big&amp;rdquo; events. Women are more interested in individual connections, forming alliances with those they like, and they focus on the immediate rather than distant political goals. Of course, these are statistical tendencies — exceptions always exist.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Power and Sex
 &lt;div id="power-and-sex" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#power-and-sex" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Avoiding incest is a moral or legal constraint in human society, often considered part of human culture. If mating were purposeless, would group-living chimpanzees have incest problems? In reality, such problems are extremely rare. Chimpanzees actively avoid incest. Mothers know who their sons are, and when a son reaches adulthood, chimpanzee mothers absolutely will not tolerate incestuous behavior. Young chimpanzees may not know who their fathers are, but they strongly resist mating with males roughly their father&amp;rsquo;s age. Biologists believe incest avoidance is a natural law deeply embedded in culture.&lt;/p&gt;
&lt;p&gt;Power and sex are certainly linked. Chimpanzee alphas typically enjoy extremely high mating privileges — until overthrown by a rebel. But these mating privileges occur during ordinary times; female chimpanzees will secretly mate with males they treated coldly during the day — at night, or in places the alpha can&amp;rsquo;t see, like in the tall grass. How similar this is to human society needs no elaboration.&lt;/p&gt;
&lt;p&gt;Jealousy produces more offspring. Chimpanzee social structure includes multiple females and males. More jealous males will do everything to prevent other males from contacting females, giving themselves more opportunities to sire offspring — and those offspring, in turn, will also be more jealous. Females, however, are entirely different: no matter whom she mates with, her number of offspring is fixed, and the offspring are always hers. So jealousy among females is not pronounced. But in pair-bonding species, things look completely different — in pair-bonding species, females also engage in sexual competition. In such cases, females are more inclined to maintain long-term relationships with males. In modern human society, men care more about whether their female partner has had sex with another man; women care more about whether their partner has fallen in love with another woman. At its essence, even the cornerstone of human society — the family — is merely a unit of sex and reproduction.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Closing
 &lt;div id="closing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#closing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s actually a lot more interesting material I haven&amp;rsquo;t gotten to — too lazy to expand further. Some perspectives I personally really like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Humans are engaged in continuous office competition while simultaneously uniting against a common enemy.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Hierarchical order is a cohesive factor that imposes limits on competition and conflict.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The roots of politics are far older than humanity.&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Universal Safety Disclaimer
 &lt;div id="universal-safety-disclaimer" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#universal-safety-disclaimer" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Large portions of this article are drawn from the book &lt;em&gt;Chimpanzee Politics&lt;/em&gt; and do not represent my personal views.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Homo Deus: A Brief History of Tomorrow</title><link>https://lastdba.com/en/2024/08/12/book-notes-homo-deus-a-brief-history-of-tomorrow/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-homo-deus-a-brief-history-of-tomorrow/</guid><description>&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Homo Deus: A Brief History of Tomorrow&lt;/em&gt; is one of the trilogy by Israeli historian Yuval Noah Harari. The trilogy consists of &lt;em&gt;Sapiens: A Brief History of Humankind&lt;/em&gt;, &lt;em&gt;Homo Deus: A Brief History of Tomorrow&lt;/em&gt;, and &lt;em&gt;21 Lessons for the 21st Century&lt;/em&gt;. The most famous, of course, is &lt;em&gt;Sapiens&lt;/em&gt; — an extraordinarily sweeping book about the history of human civilization that can absolutely reshape your view of history. Last year (2022), I stubbornly gnawed through the English original of &lt;em&gt;Sapiens&lt;/em&gt; page by page — quite an achievement. Because I loved &lt;em&gt;Sapiens&lt;/em&gt; so much, &lt;em&gt;Homo Deus&lt;/em&gt;, the sequel from this giant of a thinker, naturally became this year&amp;rsquo;s most important &amp;ldquo;extracurricular reading.&amp;rdquo;&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Homo Deus: A Brief History of Tomorrow&lt;/em&gt; is one of the trilogy by Israeli historian Yuval Noah Harari. The trilogy consists of &lt;em&gt;Sapiens: A Brief History of Humankind&lt;/em&gt;, &lt;em&gt;Homo Deus: A Brief History of Tomorrow&lt;/em&gt;, and &lt;em&gt;21 Lessons for the 21st Century&lt;/em&gt;. The most famous, of course, is &lt;em&gt;Sapiens&lt;/em&gt; — an extraordinarily sweeping book about the history of human civilization that can absolutely reshape your view of history. Last year (2022), I stubbornly gnawed through the English original of &lt;em&gt;Sapiens&lt;/em&gt; page by page — quite an achievement. Because I loved &lt;em&gt;Sapiens&lt;/em&gt; so much, &lt;em&gt;Homo Deus&lt;/em&gt;, the sequel from this giant of a thinker, naturally became this year&amp;rsquo;s most important &amp;ldquo;extracurricular reading.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Sapiens&lt;/em&gt; tells the story of human history — from &lt;em&gt;Homo sapiens&lt;/em&gt; standing upright to launching rockets to explore the stars: how did we get here? &lt;em&gt;Homo Deus&lt;/em&gt; discusses the critical issues currently facing human civilization, and where we are headed.&lt;/p&gt;
&lt;p&gt;This copy of &lt;em&gt;Homo Deus&lt;/em&gt; was hard to come by. In the end, I bought a second-hand Chinese edition from JD — it came from the library of Xingtan Liang Qiuju Middle School &amp;#x1f604;. When I opened to the first page, a cheeky middle schooler had left a line of English. Let&amp;rsquo;s start with that:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;When facing the ultimate questions of this chaotic world, we need Chinese readers to contribute their wisdom.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/43ce9fb701c5.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;The New Agenda
 &lt;div id="the-new-agenda" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-new-agenda" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Famine
 &lt;div id="famine" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#famine" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Open almost any history book, and you&amp;rsquo;ll read about the horrors of famine and the insane behavior of people pushed to starvation. There&amp;rsquo;s no need to bring up famines in other countries — the most noteworthy case is right here in China. From the earliest written records all the way to the 20th century, China suffered the ravages of famine for thousands of years. We&amp;rsquo;ve always been an agricultural nation; nearly everyone had to work the land to feed themselves and their families. If crops failed — due to natural disasters (too much or too little rain, locust plagues, etc.) or human interference (bandits, oppressive taxes, irregular planting) — some people would face food shortages. Most modern people have no idea what it feels like to go without food for days on end. I&amp;rsquo;ve been hungry for stretches myself, and I know that prolonged hunger is a misery the average person can&amp;rsquo;t imagine — but even I was never at risk of starving to death. Yet our ancestors, facing the prospect of actually starving to death, what kind of despair must they have felt? They had no solution but to pray to the gods for favorable weather and a bountiful harvest the following year.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a line from &lt;em&gt;House of Cards&lt;/em&gt; that really stuck with me: &amp;ldquo;Twenty years ago, I couldn&amp;rsquo;t buy sugar in China. Now I can buy it anywhere.&amp;rdquo; Crude as it sounds, it reflects a reality: the Chinese people have escaped poverty. For the first time in Chinese history, we are no longer tormented by famine. We created this economic miracle — something worth recording! Similarly, human civilization as a whole has recently solved the problem of hunger. Food shortages in particular regions are almost always caused by political factors, and internationally, there are ample surplus resources for emergency response to shortages. Food scarcity is no longer a human agenda item.&lt;/p&gt;
&lt;p&gt;On the contrary, humanity is no longer concerned with food shortages but is starting to worry about food &lt;em&gt;surpluses&lt;/em&gt;. Health problems caused by obesity and malnutrition far outnumber those caused by starvation. Many people mindlessly chew through bread, rice, and loads of carbohydrates without getting enough protein and vitamins. The rich eat lettuce salads; poor Westerners eat cake, burgers, and pizza; and I eat fried dough sticks, steamed buns, rice, and noodles — my weight keeps climbing every day, and my health problems multiply year by year&amp;hellip;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Bacteria and Viruses
 &lt;div id="bacteria-and-viruses" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bacteria-and-viruses" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;The Black Death&lt;/strong&gt;: In the 1330s, the Black Death — the bacterium &lt;em&gt;Yersinia pestis&lt;/em&gt; — caused 70 to 200 million deaths worldwide, with a mortality rate of roughly 50%.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Spanish Flu&lt;/strong&gt;: 1918. Infected 500 million people; 50 to 100 million died. Mortality rate around 15%.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Smallpox&lt;/strong&gt;: 1967 — 15 million infected, 2 million deaths. Mortality rate about 15%. Following global smallpox vaccination, the smallpox virus was eradicated by humanity in 1979.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AIDS&lt;/strong&gt;: Broke out in the 1980s. Over 30 million deaths. Destroys the immune system. Current medications are effective but cannot provide a perfect cure. Infection rate: 0.9%. Mortality rate: 1.28 per 100,000.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SARS&lt;/strong&gt;: 2003. 8,000 infected, over 700 deaths.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Avian Flu&lt;/strong&gt;: Fewer than 1,000 deaths.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;H1N1 Swine Flu&lt;/strong&gt;: 2009. 700 million to 1.4 billion infected. Approximately 150,000 to 600,000 deaths. Infection rate: 20%. Mortality rate: ~0.02%.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Ebola&lt;/strong&gt;: Multiple outbreaks in Africa. Mortality rate above 50%.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of the above, only the Black Death is bacterial; all the rest are viral. The Black Death is too ancient; though bacterial, due to the primitive state of medical care at the time, people had no idea what was happening, leading to massive casualties and an extraordinarily high fatality rate. Smallpox is humanity&amp;rsquo;s greatest success story in the war against viruses — through modern medicine and vaccines, we outright eliminated the smallpox virus. As you can see, humanity has developed a silver bullet for bacteria — antibiotics. Bacterial epidemics are essentially gone. But for viral influenzas, they keep emerging in an endless cycle: as one subsides, another rises. There&amp;rsquo;s no great solution; modern medicine still has room to improve against viral epidemics. Major viral pandemics still strike every few years, and seasonal flu never stops accompanying us.&lt;/p&gt;
&lt;p&gt;Most of these influenzas are weathered by the human immune system alone — modern medicine only plays a supporting role (basically, bringing down fevers). Especially for human infants: aside from getting all manner of vaccines right after birth, every other &amp;ldquo;cold&amp;rdquo; has to be tough out by their own immune systems, with very few effective medications available. Kindergarten is less a place of learning and more a trial ground for human influenza and immune resistance.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;COVID-19&lt;/strong&gt;: &lt;em&gt;Homo Deus&lt;/em&gt; was published in 2015, before COVID-19 happened. The author&amp;rsquo;s view on epidemics was: &amp;ldquo;Doctors can quickly get up to speed and rapidly discover treatments — humanity has probably already conquered epidemics.&amp;rdquo; I wonder what Yuval Noah Harari makes of COVID-19.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Regarding COVID-19, there&amp;rsquo;s simply too much I want to say. So many grievances that they defy coherent complaint. In a single sentence: &amp;ldquo;On the matter of COVID-19, humanity was utterly shattered and exposed in all its ugliness.&amp;rdquo;&lt;/p&gt;

&lt;h3 class="relative group"&gt;War
 &lt;div id="war" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#war" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Skipped.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Humanism
 &lt;div id="humanism" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#humanism" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The concept of humanism emerged during the Renaissance, championing human rights and individual value in opposition to the religious theocracy of the time. It reached China roughly in the late Qing dynasty. Humanism has had a profound impact on modern society: people increasingly emphasize the concepts of the individual or the collective, rather than top-down religious dogma and the divine right of kings.&lt;/p&gt;
&lt;p&gt;Humanism advocates human rights against divine right, and individual freedom against personal dependency. What humanism worships is human nature — the human being itself.&lt;/p&gt;
&lt;p&gt;People are always contemplating the meaning of life. Humanism holds that humanity itself is the source of meaning: &amp;ldquo;I am the meaning.&amp;rdquo; It also holds that free will is the highest authority. Humanism proposes a new life principle: &amp;ldquo;If I feel it&amp;rsquo;s good, it&amp;rsquo;s good; if I feel it&amp;rsquo;s bad, it&amp;rsquo;s bad.&amp;rdquo; For example: if a woman has an affair, in pre-humanist society, she would face punishment from religion and social norms — the censure of priests and elders. In modern society, she need only heed her own true feelings; the best approach is to ask her own heart what it thinks.&lt;/p&gt;
&lt;p&gt;For society as a whole: what everyone believes is good &lt;em&gt;is&lt;/em&gt; &amp;ldquo;good&amp;rdquo;; what everyone believes is bad &lt;em&gt;is&lt;/em&gt; &amp;ldquo;bad.&amp;rdquo; Take theft, for example. For the victim, it&amp;rsquo;s certainly bad. For everyone else, it&amp;rsquo;s also bad — because others don&amp;rsquo;t want to be stolen from either, including thieves themselves. Thus, theft is bad, and people can even write it into a mutually binding document. By the same logic, if a certain behavior feels bad to no one at all, then it&amp;rsquo;s not wrong. This naturally leads to the question of homosexuality: two people of the same sex feel that this is good, and it affects no one else — therefore, it&amp;rsquo;s not wrong. So humanism supports homosexuality and opposes religion.&lt;/p&gt;
&lt;p&gt;Humanism can perfectly address these two types of extreme questions. But for events that are good for some and bad for others — like the trolley problem — it&amp;rsquo;s much harder to answer. In ancient societies, Confucianism advocated that women remain faithful to one husband unto death, even erecting chastity archways. In modern society, as long as one can find happy days, people don&amp;rsquo;t want to stay bound in misery. But what if divorce leads to happiness for one side and utter misery for the other? Add the emotional harm to the children, and the whole situation becomes very hard to measure: whose happiness matters more? Humanism will only tell you: &amp;ldquo;Follow your own heart.&amp;quot;~&lt;/p&gt;
&lt;p&gt;As humanism gained broader acceptance, it evolved into three major branches:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Liberal Humanism&lt;/strong&gt;: The &amp;ldquo;orthodox&amp;rdquo; liberal humanism, also known as liberalism. The individual enjoys freedom; individual choice is respected. If it feels right to each person, it&amp;rsquo;s right. The classic example is liberalism&amp;rsquo;s belief that the ballot box represents individual will. But this requires one precondition: before voting, everyone must be &amp;ldquo;one of us.&amp;rdquo; For instance, the American North and South in 1861, or Israel and Palestine today — neither could possibly resolve their issues by having everyone vote together.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Socialist Humanism&lt;/strong&gt;: Socialist humanism doesn&amp;rsquo;t focus on individual feelings, viewing them as a bourgeois trap. What &amp;ldquo;I&amp;rdquo; feel in the present moment is merely a reflection of my environment, determined by my class. Liberalism believes voters can make the best choice; socialist humanism believes the organization can make the best choice. The individual must obey the organization&amp;rsquo;s decisions, not personal feelings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Evolutionary Humanism&lt;/strong&gt;: Evolutionary humanism derives from Darwin&amp;rsquo;s theory of evolution. It holds that conflict is a form of evolution — eliminating the weak, survival of the fittest. Superior people deserve to survive; this is the law of human evolution. Evolutionary humanism was once all the rage, giving rise to many ideas such as eugenics, racism, and fascism.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;From 1914 to 1989, the three humanisms waged a war of faith. Liberalism and socialism joined forces to defeat Nazism in World War II. Then liberal nations and the Soviet Union each rallied allies into the Cold War. In the early Cold War, socialism consistently held the upper hand (the documentary &lt;em&gt;The Vietnam War&lt;/em&gt; is highly recommended here) — students at UC Berkeley even kept Chairman Mao&amp;rsquo;s Little Red Book by their bedsides. Then, everything changed. The Soviet Union collapsed. Many countries shifted their beliefs; we too introduced market capitalism. People preferred supermarkets (or Taobao) and money-making companies over a system that allocated food and clothing. Liberalism won a sweeping victory in this war of faith — they even evolved further, adopting ideas and institutions from their rivals to provide better education, healthcare, and social security than before. But liberalism&amp;rsquo;s core ideology remained unchanged.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Dataism
 &lt;div id="dataism" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#dataism" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Dataism holds the following three views:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Organisms are algorithms.&lt;/li&gt;
&lt;li&gt;Intelligence can exist without consciousness.&lt;/li&gt;
&lt;li&gt;Highly intelligent algorithms know me better than I know myself.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f11b7c00c9f7.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Organisms Are Algorithms
 &lt;div id="organisms-are-algorithms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#organisms-are-algorithms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&amp;ldquo;Organisms are algorithms&amp;rdquo; — I couldn&amp;rsquo;t accept this notion when I first encountered it either. How could organisms be algorithms? Doesn&amp;rsquo;t human experience matter? Is human consciousness worthless?&lt;/p&gt;
&lt;p&gt;Looking at capitalism and Soviet-style communism from the perspective of data processing, they are no longer ideological opposites but rather different data algorithms. Capitalism employs a distributed algorithm; Soviet-style communism employs a centralized algorithm. Capitalism allows connections between consumers and producers, permits individuals to freely exchange information and make independent decisions — the pricing and output of goods are determined by the free market. Soviet-style communism, on the other hand, severed the link between producers and consumers: the government collected consumption data and issued production directives to producers. The government took all of the workers&amp;rsquo; productive surplus, then determined what each individual needed, then re-distributed accordingly. Tax rates work the same way — high tax rates essentially concentrate more resources together, with the government as a single processor deciding how resources are allocated and utilized.&lt;/p&gt;
&lt;p&gt;A single processor can&amp;rsquo;t possibly make the right decisions forever. No one person can handle such enormous amounts of data — even today&amp;rsquo;s high-speed computers can&amp;rsquo;t process it all.&lt;/p&gt;
&lt;p&gt;From the perspective of Dataism, capitalism won the Cold War because its distributed algorithm was better suited to that era than Soviet-style communism&amp;rsquo;s centralized algorithm: the better data algorithm prevailed. When we chose to embrace the market economy and abandon Soviet-style communism, it was equivalent to decentralizing processing power to every individual, no longer using the single-processor model. That&amp;rsquo;s why Socialism with Chinese Characteristics survived the Cold War, while the Soviet single-processor data model failed utterly. Currently, only a very few authoritarian states still use this single-processor model — and after all these years, we&amp;rsquo;ve seen no productivity advances from them. This is also a real-world reflection of &amp;ldquo;organisms are algorithms.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m merely using the Dataist lens to view economic models here — no intention of judging which model is better or worse. Beyond fitting economic models so neatly, Dataism can also be applied to view problems in many other domains.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Intelligence Can Exist Without Consciousness
 &lt;div id="intelligence-can-exist-without-consciousness" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#intelligence-can-exist-without-consciousness" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;First, we need to be clear: what is consciousness? Someone might say, &amp;ldquo;Consciousness is the self,&amp;rdquo; or &amp;ldquo;Consciousness is the voice inside.&amp;rdquo; These don&amp;rsquo;t answer the question scientifically. Note: science deals with objective facts; subjective matters fall outside science&amp;rsquo;s domain — they belong to theology. We cannot explain the subjective using the subjective. In truth, humanity still hasn&amp;rsquo;t figured out what consciousness is.&lt;/p&gt;
&lt;p&gt;If every person is an algorithm, then there&amp;rsquo;s really no concept of &amp;ldquo;autonomous consciousness.&amp;rdquo; We can regard what we hear, smell, and see as &amp;ldquo;input data.&amp;rdquo; After computation by our biological organism, a response is produced and an action taken — that&amp;rsquo;s the &amp;ldquo;output data.&amp;rdquo; The human body itself is more like a CPU — perhaps one that can self-regulate, but even the regulation itself requires data input, like learning knowledge or exercising. So what role does &amp;ldquo;self-consciousness&amp;rdquo; play in this process? I can clearly make choices about something — if I choose differently, a different outcome results. I must be consciously aware&amp;hellip; right? This question may not be so easy to answer. If, hypothetically, there were no subjective consciousness — not brain death, but &amp;ldquo;I can&amp;rsquo;t feel my self&amp;rdquo; — would &amp;ldquo;I&amp;rdquo; still make different choices?&lt;/p&gt;
&lt;p&gt;From a biological perspective, consciousness is nothing more than countless electrical currents in the brain&amp;rsquo;s neural network. When &amp;ldquo;I&amp;rdquo; make a different choice, it may simply be that some nerve ending fired an extra tiny electrical pulse. &amp;ldquo;Self-consciousness&amp;rdquo; played no role whatsoever in this process. Without &amp;ldquo;me,&amp;rdquo; it seems my body could still make different choices, as long as the &amp;ldquo;algorithm&amp;rdquo; stored in my body still exists. If &amp;ldquo;self-consciousness&amp;rdquo; exists, it&amp;rsquo;s more like a belief rather than an objective fact — like believing in God. The most cutting-edge biological science suggests that consciousness is merely a byproduct of an individual organism&amp;rsquo;s algorithms — it could even be viewed as a kind of mental pollution.&lt;/p&gt;
&lt;p&gt;Then we arrive at another question: is artificial intelligence (AI) conscious? If it&amp;rsquo;s not conscious, can we treat it as an intelligent being? The best method humans currently have for testing whether AI has consciousness is the Turing Test. The Turing Test&amp;rsquo;s logic is simple: as long as a normal human can&amp;rsquo;t tell whether the AI is human or not, it passes. In other words, once AI becomes smart enough, we humans have no choice but to consider it &amp;ldquo;conscious.&amp;rdquo;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Algorithms Know Me Better Than I Know Myself
 &lt;div id="algorithms-know-me-better-than-i-know-myself" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#algorithms-know-me-better-than-i-know-myself" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Humanism calls on us to listen to our inner authentic voice. But if the self doesn&amp;rsquo;t even exist, what is there to listen to? Dataism calls on us to &amp;ldquo;listen to the algorithm&amp;rsquo;s advice&amp;rdquo; — the algorithm knows me better than I know myself. For example: when a woman is on a blind date and meets two men who both seem suitable, without algorithmic assistance, she would follow her inner voice and choose the one who &amp;ldquo;feels&amp;rdquo; more right. Now imagine an algorithm tells her: &amp;ldquo;I know you very well. I know you&amp;rsquo;re attracted to Man A; you&amp;rsquo;ll choose him. But he will ultimately break your heart and leave you. Man B is the one for you — and if you choose B, you&amp;rsquo;ll fall in love just as quickly. He will give you lasting happiness. This is a choice you won&amp;rsquo;t regret.&amp;rdquo; From any angle, shouldn&amp;rsquo;t she listen to the algorithm&amp;rsquo;s advice rather than that fleeting feeling of the moment?&lt;/p&gt;
&lt;p&gt;In its early R&amp;amp;D phase, algorithms are built by engineers continuously piling up code. At this stage, people still have a decent grasp of what the algorithm is &amp;ldquo;thinking.&amp;rdquo; But algorithms can self-learn and self-update. Their learning capacity is utterly beyond human comparison. They will gradually carve out their own path, until humans can no longer keep up.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Closing Thoughts
 &lt;div id="closing-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#closing-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Homo Deus&lt;/em&gt; is, as ever, packed with substance — novel, robust ideas, all-encompassing. A highly recommended work. While reading, I often paused to reflect: does what he&amp;rsquo;s saying match reality? Is it correct? Many times, I felt shocked. I used to never underline when reading books, but I did so with this one. When I finished, I found the book covered in my highlights.&lt;/p&gt;
&lt;p&gt;This monumental work is so dense with content that this article can&amp;rsquo;t possibly cover everything. This piece is relatively one-sided — I&amp;rsquo;ve mostly only discussed productivity-related viewpoints. There&amp;rsquo;s actually a great deal of other fascinating material, such as the book&amp;rsquo;s perspective on &amp;ldquo;happiness&amp;rdquo;: &amp;ldquo;Would you rather be an unhappy but wealthy Singaporean, or a happy but poor Costa Rican?&amp;rdquo; I don&amp;rsquo;t know how I&amp;rsquo;d answer. But if the author rephrased the question to me as: &amp;ldquo;Would you rather eat more hot pot, or eat vegetables and whole grains every day, maintaining a nutritionally balanced, healthy body?&amp;rdquo; — then I&amp;rsquo;d definitely answer: hot pot.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Romance of the Three Kingdoms</title><link>https://lastdba.com/en/2024/08/12/book-notes-romance-of-the-three-kingdoms/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-romance-of-the-three-kingdoms/</guid><description>&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Mention &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt;, and it seems almost everyone can name a few characters or plot points. But have you actually read the original?&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve always been a fan of Three Kingdoms-themed games — titles like &lt;em&gt;Bàwáng Dàlù&lt;/em&gt; (The Overlord&amp;rsquo;s Continent) and &lt;em&gt;Total War: Three Kingdoms&lt;/em&gt; are among my favorites. I love the feeling of collecting famous generals and rampaging across the battlefield. But thinking back, I realized I&amp;rsquo;d never actually read &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; in its entirety. Some of those generic officers in Total War — I had no idea who they were. And when I thought about it, I couldn&amp;rsquo;t come up with a single novel that could stand toe-to-toe with &lt;em&gt;Romance&lt;/em&gt;, so I decided to give the original a try. Once I started, I couldn&amp;rsquo;t stop&amp;hellip;&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Mention &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt;, and it seems almost everyone can name a few characters or plot points. But have you actually read the original?&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve always been a fan of Three Kingdoms-themed games — titles like &lt;em&gt;Bàwáng Dàlù&lt;/em&gt; (The Overlord&amp;rsquo;s Continent) and &lt;em&gt;Total War: Three Kingdoms&lt;/em&gt; are among my favorites. I love the feeling of collecting famous generals and rampaging across the battlefield. But thinking back, I realized I&amp;rsquo;d never actually read &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; in its entirety. Some of those generic officers in Total War — I had no idea who they were. And when I thought about it, I couldn&amp;rsquo;t come up with a single novel that could stand toe-to-toe with &lt;em&gt;Romance&lt;/em&gt;, so I decided to give the original a try. Once I started, I couldn&amp;rsquo;t stop&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; is written in the vernacular Chinese of the ancient period, which differs somewhat from modern vernacular Mandarin. For example, &amp;ldquo;暗赍金帛，结交中涓封谞&amp;rdquo; means secretly bringing gold and silk to befriend the eunuch Feng Xu (赍, pronounced &lt;em&gt;jī&lt;/em&gt;, means &amp;ldquo;to bring&amp;rdquo; — a common exam term; 中涓 was a close-attendant official title, later used to refer to eunuchs in general). At first, it was admittedly hard going, but after a while it became quite smooth. When I didn&amp;rsquo;t understand something, I&amp;rsquo;d just check the annotations or underline it (once again, thank you, e-books). Also, a reading tip: skip the preface. I recommend mentally filtering out keywords like &amp;ldquo;peasant uprising&amp;rdquo;, &amp;ldquo;dialectical&amp;rdquo;, &amp;ldquo;feudal&amp;rdquo;&amp;hellip;&lt;/p&gt;
&lt;p&gt;Many people confuse &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; with &lt;em&gt;Records of the Three Kingdoms&lt;/em&gt;. Let me emphasize this: &lt;strong&gt;&lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; is a novel; &lt;em&gt;Records of the Three Kingdoms&lt;/em&gt; is an official history&lt;/strong&gt;. Some might counter, &amp;ldquo;But the &lt;em&gt;Records&lt;/em&gt; was privately compiled,&amp;rdquo; or &amp;ldquo;There is no single truth in history.&amp;rdquo; You can believe there&amp;rsquo;s no absolute truth in history, but if you carry that attitude into historical scholarship, then there&amp;rsquo;s no point studying history at all. Not every statement in an official history is precise — some contain ambiguous or even contradictory accounts — but that only affects its reference value, not its status as an official history. The &lt;em&gt;Records of the Three Kingdoms&lt;/em&gt; is one of the Twenty-Four Histories, an undisputed official history, beyond all doubt. &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; is a novel written with deep reference to the &lt;em&gt;Records&lt;/em&gt;, upon which artistic embellishments were layered.&lt;/p&gt;
&lt;p&gt;Unless I explicitly mention the &lt;em&gt;Records&lt;/em&gt;, this piece discusses the novel alone. Although I dipped into the &lt;em&gt;Records&lt;/em&gt; (and discovered that Total War draws primarily from the &lt;em&gt;Records&lt;/em&gt; 👍), I found it too hardcore and decided to give up. In any case, the novel&amp;rsquo;s characters and plotlines all involve artistic license and differ from history — readers, please keep the distinction in mind.



&lt;img src="https://lastdba.com/img/csdn/86043abcfb26.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Red Cliffs
 &lt;div id="red-cliffs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#red-cliffs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;There are many plotlines in Three Kingdoms worth discussing, but given space constraints (or, honestly, I just don&amp;rsquo;t feel like writing more), I&amp;rsquo;ll focus on Red Cliffs.&lt;/p&gt;
&lt;p&gt;The Battle of Red Cliffs is undoubtedly the crown jewel of the novel. All the great names make their entrance, stratagems fly thick and fast, and the intellectual duels between Zhou Yu and Kongming (Zhuge Liang) elevate the battle to the pinnacle of wit. I&amp;rsquo;ve always been fond of Zhou Yu — brimming with talent, dashing and heroic, brave and resourceful, commanding armies with brilliance, achieving greatness young (a winner in life), with the ability of a king&amp;rsquo;s right-hand minister. But to highlight Zhuge Liang&amp;rsquo;s genius, the novel deliberately places Zhou Yu&amp;rsquo;s talents a notch below Kongming at every turn, making him Red Cliffs&amp;rsquo; absolute foil to set off Zhuge Liang. As I watched the TV series and read the novel, I increasingly felt that the early-period Zhuge Liang was simply a &amp;ldquo;monster&amp;rdquo; — &amp;ldquo;utterly inhuman.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Red Cliffs features an all-star cast. All three of Liu Bei&amp;rsquo;s top strategists were involved: Zhuge Liang, Pang Tong (then serving Wu), and Xu Shu (then in Cao Cao&amp;rsquo;s camp) each played critical roles in the battle&amp;rsquo;s schemes. The warriors basically just cleaned up — only Liu Bei and Guan Yu visited Wu&amp;rsquo;s naval camp, and Zhao Yun once rescued the strategist. Wu itself was the protagonist, naturally — Zhou Yu, Lu Su, Huang Gai, Gan Ning, Kan Ze all had major parts, and the rest of the Wu officers were united in purpose, not a single one dragging their feet. On Cao Cao&amp;rsquo;s side: the Chancellor himself, advisers Cheng Yu and Xun You (Xun Yu and Jia Xu didn&amp;rsquo;t come; Guo Jia had died young), officers Mao Jie and Yu Jin, famous generals like Zhang Liao and Xu Chu essentially making cameo appearances, plus the tragic patsies Cai Mao, Zhang Yun, Cai Zhong, and Cai He, and the clown Jiang Gan&amp;hellip; I suspect many people have never read Red Cliffs carefully, or perhaps never read the original at all. I specifically drew a flowchart:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/272e97875e57.png" alt="Image description" /&gt;&lt;/p&gt;
&lt;p&gt;Two favorite passages:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;A gust of wind blew, lifting a corner of the banner to brush across Zhou Yu&amp;rsquo;s face. Yu suddenly recalled something weighing on his heart, let out a great cry, fell backwards, and vomited blood.&amp;rdquo; This brief passage is gripping, vivid as a film scene, and underscores the importance of the southeast wind — not a single word wasted.&lt;/p&gt;
&lt;p&gt;Yu said: &amp;ldquo;&amp;lsquo;Man&amp;rsquo;s fate shifts between morning and evening&amp;rsquo;; how can one guarantee one&amp;rsquo;s safety?&amp;rdquo; Kongming smiled and replied: &amp;ldquo;&amp;lsquo;The heavens hold storms none can foresee&amp;rsquo;; how can man predict them?&amp;rdquo; Yu turned pale upon hearing this and feigned moans of pain&amp;hellip; Kongming smiled: &amp;ldquo;I have a prescription that will settle the Commander&amp;rsquo;s distress.&amp;rdquo; The entire exchange never once mentions the east wind, yet Yu and Liang have already dueled several rounds over it. Truly brilliant~&lt;/p&gt;

&lt;h2 class="relative group"&gt;Embellishments
 &lt;div id="embellishments" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#embellishments" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Adding one&amp;rsquo;s own interpretations or plot elements on top of the original — I call these &amp;ldquo;embellishments.&amp;rdquo; The original plot is already extraordinarily compelling. Even where modern readers might find things hard to understand, if you immerse yourself in the mindset of ancient (Eastern Han!) people, there are virtually no logical gaps. This is one reason why &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; is held in such high regard. That&amp;rsquo;s why many people still prefer the old TV adaptation that respects the original (with minimal changes) over the new adaptation full of embellishments. Some embellishments — no one even knows who started them — conspiracy theorists abound, and many fabricated plotlines have become widely accepted as fact, which is truly a shame.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Kongming letting Cao Cao escape.&lt;/strong&gt; At Huarong Trail, where Guan Yu spares Cao Cao out of a sense of honor, Kongming deliberately sent Guan Yu knowing Cao Cao would be released. This has spawned countless interpretations. But in the original, Cao Cao escapes simply because Kongming, observing the stars at night, concluded that Cao Cao was not fated to die that night. Don&amp;rsquo;t dismiss this as childish — the novel treats &amp;ldquo;star-reading&amp;rdquo; as a &lt;em&gt;very real&lt;/em&gt; mystical phenomenon. &amp;ldquo;Read the stars and release Cao Cao&amp;rdquo; — this reason is entirely sufficient in the novel&amp;rsquo;s own terms. As for &amp;ldquo;they feared Wei&amp;rsquo;s retaliation so they let Cao Cao go&amp;rdquo; — pure later embellishment. &lt;em&gt;Romance&lt;/em&gt; never once features a plot where someone refrains from killing out of fear of retaliation. The same goes for Guan Yu&amp;rsquo;s death.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Guan Yu&amp;rsquo;s death.&lt;/strong&gt; In the original, both Wei and Wu wanted Guan Yu dead — such was the era, and in the novel, Guan Yu was a godlike figure. You couldn&amp;rsquo;t take Jing Province without killing him; both sides went all out. It&amp;rsquo;s true that later, when Liu Bei raised a great army for revenge, both Wu and Wei tried to pass the blame — but that&amp;rsquo;s all &lt;em&gt;after&lt;/em&gt; Guan Yu&amp;rsquo;s death. Also, many later commentators believe Guan Yu should have defended Jing Province rather than attacking. Here I must clear General Guan&amp;rsquo;s name: attacking Fancheng was Zhuge Liang&amp;rsquo;s order. Guan Yu simply failed to take it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pang Tong&amp;rsquo;s death.&lt;/strong&gt; The original says Kongming, observing the stars at night, saw a general&amp;rsquo;s star falling and sent a letter warning Liu Bei to be cautious. But Pang Shiyuan (Pang Tong) suspected Kongming was just afraid of him stealing glory and urging Liu Bei to advance slowly, so he in turn pressed Liu Bei to speed up the campaign — and ultimately died at Fallen Phoenix Slope. The new Three Kingdoms TV series embellished this: Liu Bei couldn&amp;rsquo;t bear to seize Yi Province, he knew there was an ambush but still entered Fallen Phoenix Slope, sacrificing himself to give Liu Bei a pretext to break with Liu Zhang&amp;hellip; (this embellishment honestly disgusts me). Liu Bei and Liu Zhang&amp;rsquo;s conflict actually escalated gradually — Liu Zhang&amp;rsquo;s subordinates were already fighting Liu Bei, but the final break came when Liu Zhang discovered Zhang Song&amp;rsquo;s letter of surrender and realized Liu Bei&amp;rsquo;s wolfish treachery. While we&amp;rsquo;re here, let&amp;rsquo;s discuss a frequently debated detail: did Liu Bei give Pang Tong the Dílú horse? (The Dílú was said to bring misfortune to its rider; Xu Shu once advised Liu Bei to gift it to an enemy to avert the curse, then ride it himself — but immediately said he was merely testing Liu Bei&amp;rsquo;s character.) I&amp;rsquo;ve seen many comments assuming Liu Bei gave Pang Tong the Dílú, but reading the original carefully, it&amp;rsquo;s actually quite ambiguous. Liu Bei gave Pang Tong a white horse, but it&amp;rsquo;s never specified as the Dílú. In fact, after leaping across Tan Stream, the Dílú basically vanishes from the story. If it truly brought misfortune, Liu Bei had given it to Liu Biao (who returned it upon learning of the curse) — and Liu Biao died anyway. If it didn&amp;rsquo;t truly bring misfortune, that&amp;rsquo;s also plausible. &lt;em&gt;Romance&lt;/em&gt; doesn&amp;rsquo;t treat all mystical elements as absolute truth: believing in them can be called respecting the spirits; disbelieving can be called being an extraordinary man or hero. Liu Bei and the Dílú lean more toward the latter, because Xu Shu was really just testing Liu Bei&amp;rsquo;s benevolence: &amp;ldquo;A man&amp;rsquo;s life and death are determined by fate — how could a horse be the cause?&amp;rdquo; If the horse truly brought misfortune, Xu Shu wouldn&amp;rsquo;t have said &amp;ldquo;I was testing you.&amp;rdquo; So personally, I believe what Liu Bei gave was not the Dílú, but simply one of his ordinary white horses. For a lord to gift his own horse was an immense honor in ancient times — this was simply meant to show Liu Bei&amp;rsquo;s genuine affection for Pang Tong. Later generations just preferred the Dílú storyline and embellished accordingly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Diaochan&amp;rsquo;s righteousness.&lt;/strong&gt; Only very rarely do embellishments improve things. In the original, the eighteen lords&amp;rsquo; coalition was utterly helpless against the Western Liang army. After entering Luoyang, they all went their separate ways — while the Emperor remained in Dong Zhuo&amp;rsquo;s clutches&amp;hellip; And then, contrast this with Diaochan, a mere woman: solely to repay Minister Wang Yun for raising her (Diaochan was his adopted daughter), she offered her body and successfully drove a wedge between Dong Zhuo and Lü Bu. After this, the novel mentions Diaochan very little (she simply follows Lü Bu, with no further plot involvement). The Three Kingdoms TV adaptation&amp;rsquo;s treatment of Diaochan after her success is truly brilliant. The old Three Kingdoms series adds an epilogue for her: to a hauntingly beautiful melody, Diaochan retreats into obscurity after her great deed, never to be heard from again. The fate of a nation rested on a frail woman — starkly contrasting with the warlords&amp;rsquo; failure against Dong Zhuo and their secret scheming against each other. This segment is exquisite. Diaochan is a true hero! Compare this to the new Three Kingdoms&amp;rsquo; treatment of Diaochan: pure schlock, fabricating a romance between Lü Bu and Diaochan — utterly an embellishment, disrespecting the original and even disrespecting Diaochan.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Mysticism
 &lt;div id="mysticism" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mysticism" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;It&amp;rsquo;s a novel, after all — many plot points are dramatized additions (the same goes for &lt;em&gt;Water Margin&lt;/em&gt; and others). A bit of artistic license for reading pleasure is &amp;ldquo;the finishing touch on a dragon painting,&amp;rdquo; not &amp;ldquo;drawing legs on a snake.&amp;rdquo; Personally, I prefer to read &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; as a fantasy novel rather than a historical one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Yellow Turban Rebellion.&lt;/strong&gt; The Yellow Turban Rebellion was less a peasant uprising than a religious war. At first, seeing Zhang Jiao cure people with talisman water, I assumed the author was portraying the Yellow Turbans as uncivilized charlatans. Then I discovered that Yu Ji also cured people with talisman water — and Yu Ji is clearly a positive character. Sun Ce disbelieved and ended up being mystically killed by the Little Conqueror. So talisman-water healing is a real thing in the author&amp;rsquo;s universe. The three Zhang brothers genuinely possess supernatural abilities, and the Yellow Turban army is basically a religious sect. I eventually accepted the talisman-water premise.&lt;/p&gt;
&lt;p&gt;&amp;ldquo;His ears hung down to his shoulders, his hands reached past his knees, and his eyes could see his own ears.&amp;rdquo; Hands past the knees, fine — but eyes that can see your own ears? That&amp;rsquo;s not an eye problem, that&amp;rsquo;s an ear problem. The man was probably an elephant&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Killing one&amp;rsquo;s wife for food.&lt;/strong&gt; While fleeing and seeking sustenance, Liu Bei encounters a hunter. Having found no game, the hunter &lt;em&gt;kills his wife and serves her as food&lt;/em&gt;. Liu Bei only realizes the previous night&amp;rsquo;s meal was the man&amp;rsquo;s wife: &amp;ldquo;overcome with sorrow, he shed tears and mounted his horse.&amp;rdquo; When Cao Cao hears of the &amp;ldquo;kill-wife-for-food&amp;rdquo; incident, &amp;ldquo;Cao ordered Sun Qian to reward him with a hundred taels of gold.&amp;rdquo; Even someone like me, with fairly open views, was utterly shocked reading this. It&amp;rsquo;s astonishing how different ancient values were from ours, and lamentable how low women&amp;rsquo;s status was — mere objects&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Star-reading.&lt;/strong&gt; In ancient times, star-reading was an official government post. In &lt;em&gt;Romance&lt;/em&gt;, it&amp;rsquo;s a skill possessed by high-level strategists. Pang Tong lacks star-reading ability; Zhuge Liang and Sima Yi possess it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rǎng (禳).&lt;/strong&gt; Actively attempting to alter fate. When Liu Bei&amp;rsquo;s Dílú horse threatened misfortune, the method Xu Shu described to dispel the calamity was called a &lt;em&gt;rǎng&lt;/em&gt; ritual. Zhuge Liang used the &lt;em&gt;qí-rǎng&lt;/em&gt; ritual to pray to the Northern Dipper, seeking to extend his life by one &lt;em&gt;jì&lt;/em&gt; (twelve years).&lt;/p&gt;

&lt;h2 class="relative group"&gt;Flaws
 &lt;div id="flaws" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#flaws" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Some important characters are described too sketchily. &amp;ldquo;By the time Song arrived, Zhang Jiao was already dead&amp;rdquo; — just a handful of words dismiss the death of the leader who ignited the earth-shaking Yellow Turban Rebellion. I always find this hard to accept; the author doesn&amp;rsquo;t even tell us &lt;em&gt;how&lt;/em&gt; Zhang Jiao died. (If you Baidu it, you&amp;rsquo;ll just get middle-school history memorization paragraphs about why the Yellow Turban uprising failed&amp;hellip;)&lt;/p&gt;
&lt;p&gt;Some plotlines are repetitive. The famous &amp;ldquo;Borrowing Arrows with Straw Boats&amp;rdquo; actually appeared earlier. Sun Jian, while attacking Huang Zu, had a similar arrow-borrowing episode: &amp;ldquo;Jian plucked the arrows embedded in his boats, amounting to over a hundred thousand.&amp;rdquo; Red Cliffs and Yiling share similarities too — &amp;ldquo;southeast wind,&amp;rdquo; &amp;ldquo;boats loaded with thatch,&amp;rdquo; and &amp;ldquo;fire attack&amp;rdquo; are all keywords of Yiling as well. The Girdle Edict in the early chapters is a compelling storyline, and later there&amp;rsquo;s a parallel with Wei Emperor Cao Fang&amp;rsquo;s blood-written edict.&lt;/p&gt;
&lt;p&gt;After Zhuge Liang&amp;rsquo;s death, the later plot isn&amp;rsquo;t very engaging. By then, almost everyone I knew was dead. There&amp;rsquo;s Jiang Wei and Deng Ai to follow, perhaps, but the plotlines are formulaic and dull. The new characters are numerous but lack distinctive portrayals — you basically can&amp;rsquo;t remember them. Later battle scenes all follow the same template: feign defeat, lure the enemy deep, a cannon blast, then charge.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Character Biographies
 &lt;div id="character-biographies" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#character-biographies" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A sharp-tongued review of several characters, with brief introductions, summaries, and key deeds. Though it doesn&amp;rsquo;t quite align with the novel&amp;rsquo;s spirit, people always love debating martial prowess and intelligence scores. Having read the entire novel, I&amp;rsquo;ll try to discuss the numbers here too.&lt;/p&gt;
&lt;p&gt;For one-on-one combat ratings, it&amp;rsquo;s not about who defeated whom — &lt;em&gt;Romance&lt;/em&gt; features many draws, or fights broken off after twenty or thirty bouts for various reasons. I measure by number of bouts exchanged. In &lt;em&gt;Romance&lt;/em&gt;, 100 bouts is generally the upper limit; fighters may rest and resume for another 100, as with Ma Chao and Xu Chu.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Wei Side
 &lt;div id="wei-side" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wei-side" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Cao Cao: Military strategist, statesman, man of letters. Extraordinarily fond of talent, shrewd himself, maxed out in both intelligence and ruling ability. The man who won the Central Plains battle royale. Welcoming Emperor Xian and establishing military farms (túntián) were both pivotal moves. There&amp;rsquo;s too much to say&amp;hellip; Everyone knows &amp;ldquo;a crafty hero in turbulent times,&amp;rdquo; but few mention &amp;ldquo;an able minister in peaceful times.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Xun Yu: Cao Cao&amp;rsquo;s key early strategist, intellect no less than Guo Jia. Loyal to the Han dynasty to the end. Killed by Cao Cao.&lt;/p&gt;
&lt;p&gt;Xun You: Wei&amp;rsquo;s strategist in the humiliating Red Cliffs campaign. Has intelligence but a notch below Guo Jia and Xun Yu.&lt;/p&gt;
&lt;p&gt;Guo Jia: Flawless. The number one grand-strategy adviser, relying on intellect rather than mysticism. Universally beloved. Died of illness while accompanying Cao Cao on the northern campaign. After Cao Cao&amp;rsquo;s defeat at Red Cliffs, he wept that Fengxiao (Guo Jia) was no longer with them — all others hung their heads in shame.&lt;/p&gt;
&lt;p&gt;Cheng Yu: A strategist who appeared frequently in the early period. High intelligence — personally, I&amp;rsquo;d rate him roughly on par with Cao Cao: top-tier, but below Xun Yu and Guo Jia. At Red Cliffs, he saw through the southeast wind issue but was talked down by Cao Cao.&lt;/p&gt;
&lt;p&gt;Jia Xu: Adviser to Li Jue, later joined Zhang Xiu, later surrendered to Cao Cao. An important mid-period Wei strategist.&lt;/p&gt;
&lt;p&gt;Xu Chu: Wei&amp;rsquo;s top-tier solo combat god. Captured He Yi alive in one bout. Fought Ma Chao for 200 bouts — the bare-chested war god. During the Hanzhong campaign, drunk on grain-transport duty, he was slow to react and got stabbed in the shoulder by Zhang Fei. Limited appearances after that.&lt;/p&gt;
&lt;p&gt;Dian Wei: Wei&amp;rsquo;s top-tier solo combat god. Master of twin halberds. Fought Xu Chu for two full &lt;em&gt;shíchén&lt;/em&gt; (four hours). Felt like Cao Cao&amp;rsquo;s personal bodyguard. Killed during Zhang Xiu&amp;rsquo;s rebellion. Limited combat record.&lt;/p&gt;
&lt;p&gt;Cao Ang: Cao Cao&amp;rsquo;s eldest son by Lady Liu. Killed during Zhang Xiu&amp;rsquo;s rebellion. Gave his horse to his father to ride, couldn&amp;rsquo;t escape himself. After the battle, Cao Cao wept only for Dian Wei, not for Cao Ang&amp;hellip;&lt;/p&gt;
&lt;p&gt;Cao Pi: Cao Cao&amp;rsquo;s eldest son by Lady Bian. One of the Three Caos. Proclaimed himself emperor immediately after Cao Cao&amp;rsquo;s death. Defeated at Hefei.&lt;/p&gt;
&lt;p&gt;Cao Zhang: Cao Cao&amp;rsquo;s second son by Lady Bian. Has combat achievements — defeated Liu Feng in three bouts. A pure warrior archetype. &amp;ldquo;A real man should emulate great generals like Wei Qing and Huo Qubing, leading a hundred thousand troops across the desert, driving out the barbarians, building a legacy of achievement — who would want to be a scholar?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Cao Zhi: Cao Cao&amp;rsquo;s third son by Lady Bian. One of the Three Caos. &amp;ldquo;Vain and flashy, lacking sincerity, addicted to wine and unrestrained.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Cao Xiong: Cao Cao&amp;rsquo;s fourth son by Lady Bian. Killed in the power struggle when Cao Pi succeeded to the throne.&lt;/p&gt;
&lt;p&gt;Cao Chong: Not mentioned in the novel.&lt;/p&gt;
&lt;p&gt;Cao Ren: A commanding general, no solo combat record, but a master of city defense. (His troops) shot Zhou Yu at Nanjun; (his troops) shot Guan Yu at Fancheng. Died during Cao Pi&amp;rsquo;s reign.&lt;/p&gt;
&lt;p&gt;Cao Hong: Often appears leading troops. Fought He Man for fifty bouts and killed him in single combat. Personally killed Yuan Tan. Rescued Cao Cao at a critical moment. Fought Ma Chao for fifty bouts — his blade technique grew disordered, his strength failing. With Cao Xiu, forced the Han Emperor to abdicate. No further appearances.&lt;/p&gt;
&lt;p&gt;Xiahou Dun: Fierce and bold. Took an arrow to the eye and swallowed his own eyeball. Fought Gao Shun for fifty bouts — victory. During &amp;ldquo;Crossing Five Passes and Slaying Six Generals,&amp;rdquo; he challenged Guan Yu to a duel, interrupted by Zhang Liao. The Wei protagonist at Bowang Slope. Died of illness during Cao Pi&amp;rsquo;s reign.&lt;/p&gt;
&lt;p&gt;Xiahou Yuan: Master of long-distance rapid strikes (couldn&amp;rsquo;t find the original quote). Many appearances leading troops — a general-type commander. Later killed by Huang Zhong at Mount Dingjun.&lt;/p&gt;
&lt;p&gt;Zhang Liao: Leader of the Five Elite Generals. Formerly under Lü Bu; close friends with Guan Yu. First-rate at leading troops, decent at solo combat. While accompanying Cao Pi against Wu, shot in the waist and killed by Wu officer Ding Feng.&lt;/p&gt;
&lt;p&gt;Zhang He: Of the Five Elite Generals. Framed by Guo Tu at Guandu; defected to Cao Cao. Defeated by Zhang Fei (at Zhang Fei&amp;rsquo;s tomb in Langzhong you can still see Zhang Fei&amp;rsquo;s inscription &amp;ldquo;Great Victory over Zhang He&amp;rsquo;s Forces&amp;rdquo;). Seems only able to trade a few dozen bouts with Zhang Fei — solo combat: average; commanding troops: first-rate. More appearances in the early period. Later, pursuing too deep, killed by Kongming&amp;rsquo;s massed crossbows at Jianmen Pass.&lt;/p&gt;
&lt;p&gt;Xu Huang: Of the Five Elite Generals. Appears so often it&amp;rsquo;s impossible to recount everything. During Li Jue and Guo Si&amp;rsquo;s rebellion, served under Yang Feng, later defected to Cao Cao. Fought Xu Chu for fifty bouts — solo combat: decent. Also close friends with Guan Yu (during the Yan Liang-Wen Chou incident, Zhang Liao and Xu Huang fought poorly; Guan Yu stepped up and cut each down in one stroke — presumably this is when they became friends&amp;hellip;). With Cao Ren, jointly defeated Guan Yu&amp;rsquo;s Jing Province army. Later, when Meng Da rebelled again, was shot in the forehead and died.&lt;/p&gt;
&lt;p&gt;Yue Jin: Of the Five Elite Generals. Also appears very frequently, often leading troops. Fought Lü Bu&amp;rsquo;s officer Zang Ba for thirty bouts; fought Ling Tong for fifty bouts — solo combat: average. During the Hefei campaign against Sun Quan, while dueling Ling Tong, Cao Xiu shot Ling Tong off his horse; Gan Ning then shot Yue Jin in the face with a single arrow. Never appears again — unclear if he recovered.&lt;/p&gt;
&lt;p&gt;Yu Jin: Of the Five Elite Generals. During Zhang Xiu&amp;rsquo;s rebellion, when people accused Yu Jin of defecting, he didn&amp;rsquo;t first clear his name but instead set up camp to resist the enemy — praised by Cao Cao. When Fancheng was besieged, he led reinforcements. Afraid Pang De would steal glory, he engaged in various petty maneuvers. Badly positioned his troops; Guan Yu flooded them and captured him. Yu Jin surrendered. After Lü Meng took Jing Province, he released the imprisoned Yu Jin back to Wei. Later scorned by Cao Pi; died in despondency.&lt;/p&gt;
&lt;p&gt;Pang De: Previously under Ma Chao. Extraordinarily brave — a personal favorite. Carried his own coffin into battle. Could fight Guan Yu for 100 bouts — the highest honor in the solo-combat world. His reputation doesn&amp;rsquo;t match the Five Tiger Generals, Xu Chu, or Dian Wei, but I personally believe his solo combat ability is on the same level. Unfortunately, never truly utilized. Then dragged down by his deadweight teammate Yu Jin: Guan Yu flooded seven armies and captured him. Refused to submit to Guan Yu, refused to surrender, was executed. A true hero.&lt;/p&gt;
&lt;p&gt;Li Dian: Frequently appears leading troops in the early-mid period. Captured Huang Shao alive. Solo combat: exchanged about ten bouts with Zhao Yun, realized he was outmatched, turned his horse and retreated. Never seen again after the Hefei campaign.&lt;/p&gt;
&lt;p&gt;Lady Zhen: Yuan Xi&amp;rsquo;s wife. After Yuan Shao&amp;rsquo;s defeat, Cao Pi snatched her and made her empress.&lt;/p&gt;
&lt;p&gt;Sima Yi: Late-period god-tier grand strategist. Can read stars. Can even grab a blade and solo. Fought Zhuge Liang to a standstill around Hanzhong. Never lost to Liang at the grand strategic level. Later seized Cao Shuang&amp;rsquo;s military power; the Sima clan took control of Wei.&lt;/p&gt;
&lt;p&gt;Sima Shi: Sima Yi&amp;rsquo;s eldest son. His characterization in the late period is relatively well done. &amp;ldquo;Round face, large ears, square mouth, thick lips. Under his left eye grew a black mole, from which sprouted dozens of black hairs.&amp;rdquo; While battling Wen Yang, &amp;ldquo;his eyeball burst out from the mole&amp;rsquo;s wound, blood streaming across the ground. In unbearable agony, yet fearing it would unsettle the troops, he merely bit his quilt and endured — biting the quilt to shreds.&amp;rdquo; Then bedridden. Shortly after, &amp;ldquo;with a great cry, his eye burst forth, and he died.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Sima Zhao: Sima Yi&amp;rsquo;s second son. Prince of Jin.&lt;/p&gt;
&lt;p&gt;Sima Yan: Sima Zhao&amp;rsquo;s son. Emperor of Jin.&lt;/p&gt;
&lt;p&gt;Deng Ai: Late-period undefeated war god. Fought Jiang Wei to a standstill, never lost at the grand strategic level. Rolled down a cliff wrapped in felt, launched a surprise raid into Shu — the Shu people thought divine soldiers had descended from heaven and opened their gates in surrender. The man who conquered Shu.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Shu Side
 &lt;div id="shu-side" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shu-side" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Liu Bei: Everyone says Liu Bei&amp;rsquo;s benevolence was fake — personally, I think that&amp;rsquo;s an embellishment. From the novel&amp;rsquo;s portrayal of Xuande, Bei was genuinely benevolent. If he&amp;rsquo;d just taken Liu Biao&amp;rsquo;s resources in Jing Province directly, none of that mess would&amp;rsquo;ve happened. When entering Shu, the outcome was indeed duplicitous, but the novel still portrays Xuande with benevolence — I choose to respect the original here.&lt;/p&gt;
&lt;p&gt;Guan Yu: A wildly popular character. Personally not a fan (in real life, this kind of person is extremely annoying). Early period: a god. Cut down foes in a single stroke. Single-handedly drove back Xu Huang + Xu Chu (probably only Lü Bu could match that feat). Arrogant and rude. Bears the lion&amp;rsquo;s share of blame for the loss of Jing Province. The loss of Jing is the novel&amp;rsquo;s plot turning point — Cao Cao, Liu Bei, Zhang Fei, Huang Zhong all die in rapid succession; remaining generals and strategists all fade from the storyline. Also, this man has an arrow-magnet constitution: shot during Crossing Five Passes, shot by an &amp;ldquo;air arrow&amp;rdquo; at Changsha fighting Huang Zhong, shot fighting Pang De, shot with a poisoned arrow attacking Fancheng.&lt;/p&gt;
&lt;p&gt;Zhang Fei: Zhang Fei is a highly stylized character, but his combat record is better than Guan Yu&amp;rsquo;s. &amp;ldquo;Round-eyed rogue,&amp;rdquo; brave and cunning, hates evil like an enemy, true to his nature. The only person in the entire Three Kingdoms who dares to taunt Lü Bu. Can fight Lü Bu for 100 bouts. Drank off Cao Cao&amp;rsquo;s army at Changban Slope. Marched into Shu by land. Honorably released Yan Yan. Shattered Zhang He. Stabbed and wounded Xu Chu. Can lead troops, can solo, has tactical intelligence — a top-tier Three Kingdoms general. One scene is especially moving: after Guan Yu&amp;rsquo;s death, Liu Bei kept delaying the revenge campaign. Zhang Fei said to Liu Bei: &lt;strong&gt;&amp;ldquo;Our brother is dead — what&amp;rsquo;s the point of being emperor?&amp;rdquo;&lt;/strong&gt; &amp;ldquo;If you won&amp;rsquo;t avenge our brother, don&amp;rsquo;t bother seeing me again.&amp;rdquo; 👍. I previously visited Zhang Fei&amp;rsquo;s tomb in Langzhong — his calligraphy was remarkably refined, nothing like the crude brute you&amp;rsquo;d imagine&amp;hellip;&lt;/p&gt;
&lt;p&gt;Zhuge Liang: A monster.&lt;/p&gt;
&lt;p&gt;Pang Tong: &amp;ldquo;Sleeping Dragon and Young Phoenix — obtain one and you can have the realm&amp;rdquo; is pure bluster. Cannot be ranked alongside Zhuge Liang. Combat record is basically negative.&lt;/p&gt;
&lt;p&gt;Xu Shu: God-tier grand strategist. Under Liu Bei (attached to Liu Biao), engineered the first-ever defeat of Cao Cao&amp;rsquo;s army (Cao Ren). Defining trait: filial piety&amp;hellip; Cheng Yu forged a letter from his mother to summon Yuanzhi. Xu Shu went to Cao Cao&amp;rsquo;s camp; after his mother committed suicide, Xu Shu, out of pride, still wouldn&amp;rsquo;t return to Liu Bei&amp;rsquo;s side&amp;hellip; utterly baffling.&lt;/p&gt;
&lt;p&gt;Fa Zheng: The strategist for Huang Zhong&amp;rsquo;s army when Xiahou Yuan was killed. Other schemes had no weaknesses. Died early. One of only two people Zhuge Liang ever sought advice from.&lt;/p&gt;
&lt;p&gt;Ma Su: The other person Zhuge Liang ever sought advice from. During the Southern Barbarian campaign, he was the first to propose a strategy aimed at winning hearts rather than annihilation. As long as he didn&amp;rsquo;t lead troops himself, god-tier. First time leading troops: defeated by Sima Yi. Later executed by the Chancellor. Also: the Chancellor shedding tears as he executed Ma Su — he wasn&amp;rsquo;t crying for Ma Su, but lamenting that the late Emperor&amp;rsquo;s legacy of the northern expedition remained unfulfilled.&lt;/p&gt;
&lt;p&gt;Zhao Yun: Never lost a solo fight. No one could go 100 bouts with him. Basically, he&amp;rsquo;d show up and &amp;ldquo;spear them dead in one thrust.&amp;rdquo; Evasion maxed out — &amp;ldquo;the hero of Changban Slope is still in his prime.&amp;rdquo; Rarely led troops; more like Liu Bei&amp;rsquo;s personal guard, protecting the imperial family. Liu Bei called him brother, but he never entered the core trio.&lt;/p&gt;
&lt;p&gt;Huang Zhong: Fought Guan Yu for 100 bouts at Changsha. His horse stumbled and Guan Yu spared him. Later shot an arrow without the arrowhead attached, repaying the debt. Killed Xiahou Yuan in the Hanzhong campaign.&lt;/p&gt;
&lt;p&gt;Ma Chao: &amp;ldquo;Splendid Ma Chao.&amp;rdquo; Cao Cao: &amp;ldquo;Ma Chao&amp;rsquo;s valor is no less than Lü Bu&amp;rsquo;s in his prime.&amp;rdquo; Nearly made Cao Cao cut off his beard and discard his robe in flight — Cao was saved by Cao Hong. Fought top-tier warriors Xu Chu and Zhang Fei for 200 bouts each. Rash and cruel; committed city massacres. Limited achievements under Liu Bei.&lt;/p&gt;
&lt;p&gt;Wei Yan: Had &amp;ldquo;a rebellious bone at the back of his skull.&amp;rdquo; An important Shu general in the mid-late period. Solo combat: decent. Leading troops: first-rate. Zhuge Liang predicted that after his death, Wei Yan would rebel — killed by Ma Dai. The famous Ziwu Valley gambit: though Sima Yi praised it, I personally think it&amp;rsquo;s a bit far-fetched.&lt;/p&gt;
&lt;p&gt;Yan Yan: Solo combat ability basically zero. Pummeled by Zhang Fei. Archery ability: top-tier — shot Zhang Fei&amp;rsquo;s helmet. Participated in the Hanzhong campaign. No further appearances.&lt;/p&gt;
&lt;p&gt;Huang Yueying: Actually, not much of a role. Only described when introducing Zhuge Liang&amp;rsquo;s son Zhuge Zhan: &amp;ldquo;The mother was exceedingly ugly but possessed extraordinary talents: versed in astronomy above, geography below; there was no book of strategy, divination, or escape arts she had not mastered.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Zhuge Zhan: Son of the Marquis of Wu (Zhuge Liang). Hyped up upon debut, then sent against Deng Ai — killed by Deng Ai.&lt;/p&gt;
&lt;p&gt;Liu Feng: Liu Bei&amp;rsquo;s adopted son. Solo combat ability: low. No tactical sense. Easily persuaded. A net negative. Guan Yu disliked him. Later, when Guan Yu was defeated and sought reinforcements, Liu Feng and Meng Da refused to send troops, contributing to Guan Yu&amp;rsquo;s death. Subsequently executed by Liu Bei.&lt;/p&gt;
&lt;p&gt;Meng Da: Betrayed, then betrayed again. Shares blame for Guan Yu&amp;rsquo;s death. Only highlight: shot and killed Xu Huang. Later killed by Sima Yi.&lt;/p&gt;
&lt;p&gt;Liu Bei&amp;rsquo;s Wives: Lady Gan — the one who threw herself into the well at Changban Slope, A-Dou&amp;rsquo;s birth mother. Lady Mi — died while Liu Bei was in Jing Province, which led to Wu&amp;rsquo;s marriage proposal. Sun Shangxiang — a fifty-something old ox marrying a sixteen-year-old girl&amp;hellip;&lt;/p&gt;
&lt;p&gt;Mi Zhu: Brother of Liu Bei&amp;rsquo;s wife Lady Mi. A tool character — basically Liu Bei&amp;rsquo;s envoy for delivering messages.&lt;/p&gt;
&lt;p&gt;Mi Fang: Brother of Liu Bei&amp;rsquo;s wife Lady Mi. Technically the Emperor&amp;rsquo;s brother-in-law, yet surrendered to Wu. Bears responsibility for Guan Yu&amp;rsquo;s death.&lt;/p&gt;
&lt;p&gt;Sun Qian, Jian Yong: Followed Liu Bei in the early period. No particular talent. Tool characters.&lt;/p&gt;
&lt;p&gt;Guan Ping: Guan Yu&amp;rsquo;s adopted son. Fought Pang De for thirty bouts. Later captured alongside Guan Yu by Wu; executed.&lt;/p&gt;
&lt;p&gt;Guan Xing: Guan Yu&amp;rsquo;s biological son. A key general in the mid-late period. Killed Pan Zhang — his father&amp;rsquo;s murderer — and recovered the Green Dragon Blade. A main combat general on the Qishan campaigns. Later died of illness.&lt;/p&gt;
&lt;p&gt;Zhang Bao: Zhang Fei&amp;rsquo;s biological son. A key general in the mid-late period. Appears alongside Guan Xing.&lt;/p&gt;
&lt;p&gt;Liao Hua: Originally a Yellow Turban, later followed Guan Yu. During the desperate escape from Mai Castle, ran out to seek reinforcements and survived. Later appears on the Qishan campaigns.&lt;/p&gt;
&lt;p&gt;Zhou Cang: Originally under Zhang Bao, later followed Guan Yu — carried Guan Yu&amp;rsquo;s blade. Fought Zhao Yun and lost repeatedly, taking three spear wounds. Solo combat: weak. Committed suicide after Guan Yu&amp;rsquo;s death.&lt;/p&gt;
&lt;p&gt;Ma Dai: A late-period Shu general. Frequent appearances. Achievements in the Southern Barbarian campaign and Qishan expeditions. Under the Chancellor&amp;rsquo;s brocade-bag stratagem, executed Wei Yan.&lt;/p&gt;
&lt;p&gt;Jiang Wei: A Wei defector. Inherited Zhuge Liang&amp;rsquo;s will. Accomplished in both letters and arms. Launched (I think ten) expeditions from Qishan&amp;hellip; Later, Deng Ai raided Shu; Liu Shan surrendered. Jiang Wei was still holding Jianmen Pass&amp;hellip; &amp;ldquo;We fight to the death — why do you surrender first!&amp;rdquo;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Wu Side
 &lt;div id="wu-side" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wu-side" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Sun Jian: Among the useless warlord coalition, capable of fighting. Obsessed with the Imperial Seal. Swore he didn&amp;rsquo;t have the Seal — &amp;ldquo;may I be shot dead by random arrows.&amp;rdquo; Later shot dead by Huang Zu&amp;rsquo;s troops.&lt;/p&gt;
&lt;p&gt;Sun Ce: Sun Ce the Little Conqueror. A fierce warrior. Essentially conquered all of Jiangdong single-handedly — just died too young. A personal favorite. Trading the Imperial Seal for troops to build his kingdom — a stroke of genius, surpassing his father. Killed by an enemy&amp;rsquo;s revenge attack. While recovering, because he refused to believe in superstition, was mystically killed by Yu Ji.&lt;/p&gt;
&lt;p&gt;Yu Ji: The people thought he was an immortal. Could cure people with talisman water. Executed by Sun Ce. His ghost haunted Sun Ce and killed him&amp;hellip;&lt;/p&gt;
&lt;p&gt;Sun Quan: Zero military talent whatsoever. Pummeled at Hefei. His strong point: recognizing talent. All four of Wu&amp;rsquo;s early (and most important) Grand Commanders were strong.&lt;/p&gt;
&lt;p&gt;Taishi Ci: Appeared quite early — already present when Liu Bei was helping Tao Qian. Later joined Liu Yao, then was subdued by Sun Ce. Later, done in by Sun Quan at Hefei&amp;hellip;&lt;/p&gt;
&lt;p&gt;Gan Ning: &amp;ldquo;Brocade Sail Pirate.&amp;rdquo; Wu&amp;rsquo;s number one combat power. Expert archer. (Honestly, Wu&amp;rsquo;s generals&amp;rsquo; combat ability is not impressive.)&lt;/p&gt;
&lt;p&gt;Ling Tong: His father Ling Cao was killed by Gan Ning — a blood feud. During the Hefei campaign, saved by Gan Ning; they reconciled.&lt;/p&gt;
&lt;p&gt;Huang Gai: Master of getting beaten. Actually, no combat highlights on the battlefield. At Red Cliffs, shot off his boat by Zhang Liao with one arrow, rescued by Zhou Yu. No further news.&lt;/p&gt;
&lt;p&gt;Zhou Yu: Wu&amp;rsquo;s first Grand Commander. A winner in life. A personal favorite. Too bad: &amp;ldquo;Since Heaven gave birth to Yu, why did it also give birth to Liang?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Zhang Hong: No role.&lt;/p&gt;
&lt;p&gt;Zhang Zhao: Leader of the dove faction. Default answer to everything: surrender. Virtually none of his schemes ever worked.&lt;/p&gt;
&lt;p&gt;Lu Su: Wu&amp;rsquo;s second Grand Commander. Timid. Appreciated by Zhou Yu. No combat achievements. Many say he was &amp;ldquo;shrewd beneath a foolish exterior&amp;rdquo; — the original does have hints of this, but &amp;ldquo;shrewd beneath foolish&amp;rdquo; is a stretch. Basically just a messenger between Liang and Yu. The originator of the empty-handed Jing Province recovery attempts.&lt;/p&gt;
&lt;p&gt;Lü Meng: Wu&amp;rsquo;s third Grand Commander. Mastermind of &amp;ldquo;Crossing the River in White&amp;rdquo; (disguising troops as merchants). Can basically be considered the killer of Guan Yu. When he saw the beacon towers in Jing Province and couldn&amp;rsquo;t find a way to break through, he claimed illness and stayed home (absolutely hilarious). Later seen through by Lu Xun. After Guan Yu&amp;rsquo;s death, mystically killed by Guan Yu&amp;rsquo;s ghost.&lt;/p&gt;
&lt;p&gt;Lu Xun: Wu&amp;rsquo;s fourth Grand Commander. Mastermind of Yiling. Later participated in several major campaigns. Lu Xun&amp;rsquo;s talent was not beneath Gongjin&amp;rsquo;s (Zhou Yu&amp;rsquo;s).&lt;/p&gt;
&lt;p&gt;Zhou Tai: Fought his way in and out to rescue Sun Quan. For every wound he bore, Sun Quan made him drink a cup of wine.&lt;/p&gt;
&lt;p&gt;Pan Zhang: Fought Guan Yu — lasted only three bouts before fleeing. Guan Yu&amp;rsquo;s spirit manifested; killed by Guan Xing.&lt;/p&gt;
&lt;p&gt;Ding Feng: Shot and killed Zhang Liao. An important late-period Wu general. Survived until chapter 119.&lt;/p&gt;
&lt;p&gt;Ma Zhong: Pan Zhang&amp;rsquo;s subordinate. Many have never heard of this character, but he killed both Guan Yu and Huang Zhong. Killing Guan Yu was cleaning up; killing Huang Zhong was genuine skill — one arrow took down the master archer Huang Zhong. Later assassinated by Mi Fang.&lt;/p&gt;
&lt;p&gt;Jiang Qin, Han Dang, Xu Sheng&amp;hellip;: Too many, unremarkable, can&amp;rsquo;t remember.&lt;/p&gt;
&lt;p&gt;Sun Shangxiang: No such character exists in the official histories. In the novel, Sun Shangxiang has only personality description — she likes dancing with blades and swords. She never actually participated in combat. No children after marrying Liu Bei. Later tricked into returning to Wu; never saw Liu Bei again. But later generations adore Sun Shangxiang — she&amp;rsquo;s a fan-favorite character. Total War&amp;rsquo;s beauty icon:



&lt;img src="https://lastdba.com/img/csdn/b103703b92df.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Others
 &lt;div id="others" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#others" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;He Jin: General-in-Chief, Empress He&amp;rsquo;s brother. An utter fool. Held all the cards and played them terribly. To deal with the Ten Regular Attendants, he summoned Dong Zhuo to the capital, setting off an unstoppable chain reaction — the realm fell into chaos.&lt;/p&gt;
&lt;p&gt;Zhang Jiao, Zhang Bao, Zhang Liang: Yellow Turban rebel leaders. Could cure with talisman water, summoned divine soldiers. The rest of the time, basically got beaten up by the regular army. A religious peasant uprising, hastily concluded. The novel dismisses it with &amp;ldquo;Zhang Jiao was already dead.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Yuan Shao: Previously under He Jin — back then he was quite strategic, even dared to confront Dong Zhuo directly. &amp;ldquo;Many schemes but poor decisions.&amp;rdquo; His advisers each pushed their own agenda; none of his generals were worth anything.&lt;/p&gt;
&lt;p&gt;Yuan Tan, Yuan Xi, Yuan Shang: Yuan Shao&amp;rsquo;s three sons. Still fighting over power after Yuan Shao&amp;rsquo;s defeat.&lt;/p&gt;
&lt;p&gt;Lü Bu: The number one warrior of the Three Kingdoms. Early period: unstoppable in single combat. Only lost when ganged up on. (Late period Lü Bu once soloed Zhang Fei.) Cao Cao had suffered at Lü Bu&amp;rsquo;s hands — ultimately Cao Cao was the big winner.&lt;/p&gt;
&lt;p&gt;Chen Gong: After Cao Cao&amp;rsquo;s failed assassination of Dong Zhuo, Chen Gong followed him — and witnessed Cao Cao&amp;rsquo;s treachery: &amp;ldquo;Better that I betray the world than let the world betray me.&amp;rdquo; Disgusted with Cao Cao, he left and later joined Lü Bu.&lt;/p&gt;
&lt;p&gt;Zhang Song: Liu Zhang&amp;rsquo;s subordinate. Arrogant. Cao Cao disliked him. Later defected to Liu Bei, offered the map of Western Sichuan. Later discovered by Liu Zhang colluding with Liu Bei; executed.&lt;/p&gt;
&lt;p&gt;Zhang Xiu: Featured prominently in early battles against Cao Cao. Originally surrendered to Cao Cao, but because his aunt was forcibly taken by Cao Cao, he rebelled. Cao Ang and Dian Wei died in this battle. Later defeated by Cao Cao again; surrendered.&lt;/p&gt;
&lt;p&gt;Chunyu Qiong: Commander of the Wuchao supply depot. Drinking ruined everything.&lt;/p&gt;
&lt;p&gt;Li Ru: Dong Zhuo&amp;rsquo;s strategist. Never appears again after Dong Zhuo&amp;rsquo;s death.&lt;/p&gt;
&lt;p&gt;Zuo Ci: A full chapter of pure mysticism. Stunned everyone reading it — &amp;ldquo;Come out and see the immortal~&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Yu Ji (the physician): Cao Cao&amp;rsquo;s doctor. Tried to poison Cao Cao; discovered.&lt;/p&gt;
&lt;p&gt;Hua Tuo: The Three Kingdoms&amp;rsquo; number one physician. Skilled in surgery. Treated Zhou Tai&amp;rsquo;s wounds. Scraped Guan Yu&amp;rsquo;s bones to cure poison. Later, while treating Cao Cao&amp;rsquo;s head ailment, suspected of being a second Yu Ji — died in prison.&lt;/p&gt;
&lt;p&gt;Chen Lin: One of the Seven Masters of the Jian&amp;rsquo;an period. His &amp;ldquo;Proclamation Against the Usurper&amp;rdquo; is recommended reading in full.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Final Thoughts
 &lt;div id="final-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#final-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;There&amp;rsquo;s simply too much to discuss. I thought I could finish this piece in two or three hours — ended up costing several times that. &lt;em&gt;Romance of the Three Kingdoms&lt;/em&gt; is truly brilliant, absolutely worth reading (as if that needed saying). I probably won&amp;rsquo;t continue with the &lt;em&gt;Records&lt;/em&gt; — time to urgently start the next chapter&amp;hellip;&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Space Odyssey Series</title><link>https://lastdba.com/en/2024/08/12/book-notes-space-odyssey-series/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-space-odyssey-series/</guid><description>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b4ea64acd991.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;An unavoidable work for any sci-fi fan: Arthur C. Clarke&amp;rsquo;s classic — the &lt;em&gt;Space Odyssey&lt;/em&gt; series. The &lt;em&gt;Space Odyssey&lt;/em&gt; consists of four volumes: &lt;em&gt;2001: A Space Odyssey&lt;/em&gt;, &lt;em&gt;2010: Odyssey Two&lt;/em&gt;, &lt;em&gt;2061: Odyssey Three&lt;/em&gt;, and &lt;em&gt;3001: The Final Odyssey&lt;/em&gt;. As the titles suggest, the futuristic technological visions take place in their respective years. Don&amp;rsquo;t think 2001 has already passed — when Clarke wrote &lt;em&gt;2001&lt;/em&gt;, it was 1968! I, at least, can&amp;rsquo;t imagine what the world will look like thirty years from now, or how far humanity will have advanced in space exploration.&lt;/p&gt;</description><content:encoded>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b4ea64acd991.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;An unavoidable work for any sci-fi fan: Arthur C. Clarke&amp;rsquo;s classic — the &lt;em&gt;Space Odyssey&lt;/em&gt; series. The &lt;em&gt;Space Odyssey&lt;/em&gt; consists of four volumes: &lt;em&gt;2001: A Space Odyssey&lt;/em&gt;, &lt;em&gt;2010: Odyssey Two&lt;/em&gt;, &lt;em&gt;2061: Odyssey Three&lt;/em&gt;, and &lt;em&gt;3001: The Final Odyssey&lt;/em&gt;. As the titles suggest, the futuristic technological visions take place in their respective years. Don&amp;rsquo;t think 2001 has already passed — when Clarke wrote &lt;em&gt;2001&lt;/em&gt;, it was 1968! I, at least, can&amp;rsquo;t imagine what the world will look like thirty years from now, or how far humanity will have advanced in space exploration.&lt;/p&gt;
&lt;p&gt;I wrote a reading reflection after first finishing &lt;em&gt;2001&lt;/em&gt;, captivated by its premise, its thrilling space plotlines, its fantastical cosmic backdrop&amp;hellip; I immediately dove into the remaining three volumes. I initially expected the setting to expand ever outward, but that&amp;rsquo;s not what happened. The later three books remain within this cosmic dimension — between Jupiter and Earth — which is already very, very small. They mostly fill in plot details and imagination, bringing the entire story to completion.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Tetralogy
 &lt;div id="the-tetralogy" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-tetralogy" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;2001: A Space Odyssey
 &lt;div id="2001-a-space-odyssey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2001-a-space-odyssey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;After finishing the entire series, this one still feels like the most classic. Maybe it&amp;rsquo;s because the plot was the result of discussions with Kubrick&amp;hellip;&lt;/p&gt;
&lt;p&gt;Since I&amp;rsquo;ve already written a full reflection on it before, I won&amp;rsquo;t belabor it here. Interested friends can check out my earlier &lt;a href="https://mp.csdn.net/mp_blog/creation/editor/130515764" target="_blank" rel="noreferrer"&gt;Book Notes — 2001: A Space Odyssey&lt;/a&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;2010: Odyssey Two
 &lt;div id="2010-odyssey-two" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2010-odyssey-two" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;This volume is also brilliant. In the original, China&amp;rsquo;s spacecraft sent to explore Jupiter is named the &lt;em&gt;Qian Xuesen&lt;/em&gt;, and the &lt;em&gt;Qian Xuesen&lt;/em&gt; is the first manned mission to land on and explore Jupiter&amp;rsquo;s moon — Europa — beating the Americans to it! Though the outcome wasn&amp;rsquo;t great, the plot is thrilling~ Before the &lt;em&gt;Qian Xuesen&lt;/em&gt;&amp;rsquo;s accident, the astronauts described lower life forms on Europa and were ultimately attacked and killed by &amp;ldquo;extraterrestrial organisms.&amp;rdquo; This &amp;ldquo;disaster&amp;rdquo; plotline sparks infinite imagination: what kind of life exists on Europa? And what should we humans do about it?&lt;/p&gt;
&lt;p&gt;Finally, the monolith on Jupiter goes through a series of self-replications and ultimately transforms Jupiter into a white dwarf! Jupiter is ignited! From then on, there are two &amp;ldquo;suns&amp;rdquo; in the sky. This premise is just fantastic~&lt;/p&gt;

&lt;h3 class="relative group"&gt;2061: Odyssey Three
 &lt;div id="2061-odyssey-three" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2061-odyssey-three" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;This one feels a bit rushed, mainly because Halley&amp;rsquo;s Comet was coming. Clarke wrote in the preface that since Halley&amp;rsquo;s Comet was about to sweep past Earth, if he didn&amp;rsquo;t release the book soon, the exploration-of-Halley plotline might become untimely. Indeed, a large portion of this volume is devoted to exploring Halley&amp;rsquo;s Comet. There are some Jupiter-related plotlines, but they don&amp;rsquo;t advance the main narrative much.&lt;/p&gt;
&lt;p&gt;Halley&amp;rsquo;s Comet orbits the sun once every 76 years. Its next return is about 40 years away (July 28, 2061). Thinking back, its last perihelion was roughly when this book was written — the whole world was talking about Halley&amp;rsquo;s Comet. (I can feel that no one&amp;rsquo;s mentioned it in recent years.)&lt;/p&gt;

&lt;h3 class="relative group"&gt;3001: The Final Odyssey
 &lt;div id="3001-the-final-odyssey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#3001-the-final-odyssey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A perfect concluding work! This conclusion has influenced countless sci-fi novels — you can even clearly sense the shadow of &lt;em&gt;The Three-Body Problem&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The first three volumes are all still in the 21st century. &lt;em&gt;3001&lt;/em&gt; jumps a full thousand years! Humanity now acquires knowledge through &amp;ldquo;brain-computer interfaces&amp;rdquo; rather than learning; the speed of space travel has increased enormously&amp;hellip;&lt;/p&gt;
&lt;p&gt;But honestly, a thousand years — a thousand years and humanity has only progressed this far? I&amp;rsquo;d rather believe it was because of the sophons. Huh? Could it be that Old Liu&amp;rsquo;s sophons were inspired by this exact idea?&lt;/p&gt;
&lt;p&gt;The most brilliant part of this volume is humanity resurrecting Poole — an astronaut killed by HAL in the first book. If no one brought him up, you&amp;rsquo;d assume he was still drifting in space&amp;hellip; Resurrecting Poole not only echoes the first book&amp;rsquo;s plot but also allows us to observe and unveil the human world of the year 3001 through the eyes of an &amp;ldquo;ancient person.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Shadow of The Three-Body Problem
 &lt;div id="the-shadow-of-the-three-body-problem" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-shadow-of-the-three-body-problem" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Or to put it the other way around: the shadow of &lt;em&gt;Space Odyssey&lt;/em&gt; in &lt;em&gt;The Three-Body Problem&lt;/em&gt;. I tried recalling from memory — apologies for any omissions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Sower — the Singer. In &lt;em&gt;Space Odyssey&lt;/em&gt;, the Overlords are &amp;ldquo;planting&amp;rdquo; life; in &lt;em&gt;The Three-Body Problem&lt;/em&gt;, the Overlords casually &amp;ldquo;eliminate&amp;rdquo; life — &amp;ldquo;What does it have to do with you?&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Alien warning. &amp;ldquo;Stay away from Europa&amp;rdquo; — &amp;ldquo;Do not answer! Do not answer! Do not answer!&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Display of alien technology. The Monolith — the Droplet. Both are materials beyond human comprehension, impossibly smooth, artifacts of alien civilizations that humanity&amp;rsquo;s technology cannot fathom. They represent the vast gap between human and alien technological levels.&lt;/li&gt;
&lt;li&gt;Alien life is coming. We have time to catch our breath, but it seems like nothing we do will matter.&lt;/li&gt;
&lt;li&gt;Resistance plans. With alien beings about to arrive, people begin formulating resistance plans. At this point, humanity still doesn&amp;rsquo;t know what the enemy truly looks like.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In fact, the two works differ greatly in many ways. &lt;em&gt;Space Odyssey&lt;/em&gt; is about cosmic exploration, while &lt;em&gt;The Three-Body Problem&lt;/em&gt; is about human society as a whole facing alien civilization. &lt;em&gt;Space Odyssey&lt;/em&gt; essentially has only a handful of protagonists, even across a thousand years, and the plot mainly revolves around Jupiter. &lt;em&gt;The Three-Body Problem&lt;/em&gt; has a grander scale and far more characters&amp;hellip;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Final Thoughts
 &lt;div id="final-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#final-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;ve finally finished the &lt;em&gt;Space Odyssey&lt;/em&gt; tetralogy. You can truly feel it&amp;rsquo;s a monumental work of science fiction — it satisfies a sci-fi fan&amp;rsquo;s longing for the &amp;ldquo;exploration&amp;rdquo; of space. Before any space agency had begun exploring &amp;ldquo;there,&amp;rdquo; Arthur C. Clarke had already arrived. NASA astronauts would even write back to Clarke: &amp;ldquo;We photographed the far side of the moon. There were no monoliths, no anomalies&amp;rdquo; — almost as if saying, &amp;ldquo;You fraud, I went there precisely because I read your book!&amp;rdquo; Haha~&lt;/p&gt;
&lt;p&gt;Clarke wrote many plotlines about the &lt;em&gt;Qian Xuesen&lt;/em&gt; spacecraft in the novels, and in the afterwords of several volumes, he repeatedly emphasized that Qian Xuesen was a person who profoundly influenced the aerospace industry — both in China and the United States. The U.S. arrested him on fabricated charges, and Qian Xuesen ultimately returned to his homeland to build its aerospace program from scratch, influencing missile development. During a trip to Beijing, Clarke even made a special attempt to visit Qian Xuesen, but at the time, Qian&amp;rsquo;s health was poor, and his doctors wouldn&amp;rsquo;t permit visitors. Clarke entrusted someone to deliver an autographed copy of &lt;em&gt;Space Odyssey&lt;/em&gt; to Qian.&lt;/p&gt;
&lt;p&gt;Reading the entire series, you can feel the era&amp;rsquo;s obsession with space exploration. But after the Apollo program was shut down, people seemed to lose interest in space altogether. However, with Musk&amp;rsquo;s Mars colonization plans, the theme of &amp;ldquo;space&amp;rdquo; seems to be returning to public consciousness. NASA says they&amp;rsquo;ll land on Mars by 2040 — who knows if it&amp;rsquo;s true. I&amp;rsquo;ll come back to dig up this post then.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — To Kill a Mockingbird</title><link>https://lastdba.com/en/2024/08/12/book-notes-to-kill-a-mockingbird/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-to-kill-a-mockingbird/</guid><description>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c301867ed363.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Many people probably know &lt;em&gt;To Kill a Mockingbird&lt;/em&gt;. I saw its ratings were sky-high and couldn&amp;rsquo;t resist picking it up. Sure enough, the story is brilliant — never a dull moment. Its style is quite different from the books I&amp;rsquo;d read before. Personally, I think it&amp;rsquo;s perfectly suited for middle school readers (no condescension intended) — a simple, fun, and superbly written story. In truth, the message the whole book wants to convey is very clear: don&amp;rsquo;t harm innocent people. The real difficulty lies in how to build a brilliant story around such a simple idea.&lt;/p&gt;</description><content:encoded>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c301867ed363.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Many people probably know &lt;em&gt;To Kill a Mockingbird&lt;/em&gt;. I saw its ratings were sky-high and couldn&amp;rsquo;t resist picking it up. Sure enough, the story is brilliant — never a dull moment. Its style is quite different from the books I&amp;rsquo;d read before. Personally, I think it&amp;rsquo;s perfectly suited for middle school readers (no condescension intended) — a simple, fun, and superbly written story. In truth, the message the whole book wants to convey is very clear: don&amp;rsquo;t harm innocent people. The real difficulty lies in how to build a brilliant story around such a simple idea.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Prose Style
 &lt;div id="prose-style" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#prose-style" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I think the best thing about &lt;em&gt;Mockingbird&lt;/em&gt; is its prose. The author brings a small-town story in the American South vividly to life. Several storylines are stunningly rendered, the plot follows the timeline smoothly without feeling muddled, and it reads effortlessly and comfortably. The story plants foreshadowing from the very beginning, only unearthing the biggest reveal right at the end. The depiction of the gap between white and Black lives is also extraordinarily vivid. This book&amp;rsquo;s setting is contemporaneous with the HBO series &lt;em&gt;Boardwalk Empire&lt;/em&gt;, which I&amp;rsquo;d watched before — that show also features Black neighborhoods, so I could easily picture the white and Black communities.&lt;/p&gt;
&lt;p&gt;One scene where the protagonist gets beaten up left a deep impression: &amp;ldquo;I was pressed to the ground, and before my eyes was a tiny ant, laboriously hauling a breadcrumb through the grass.&amp;rdquo; I find it hard to articulate exactly what this passage means. She&amp;rsquo;s being assaulted, yet her attention is caught by an ant carrying a breadcrumb? Maybe it means nothing? But whatever the case, this description makes almost everyone mentally highlight it — it&amp;rsquo;s so visually evocative. And it feels very much like stepping out for a cigarette after being immersed in stressful work for too long&amp;hellip; it yanks you from tense, urgent action into another quiet world, then immediately back again.&lt;/p&gt;
&lt;p&gt;Another Prime Minister story also left a deep impression. The young protagonist asks her father: &amp;ldquo;What&amp;rsquo;s a &amp;lsquo;whore&amp;rsquo;?&amp;rdquo; Her father tells her a story about a Prime Minister blowing a feather: &amp;ldquo;Every day the Prime Minister sits in the House of Commons blowing a feather toward the ceiling, straining every sinew to keep it from drifting down — yet people around him keep losing their heads one after another.&amp;rdquo; Reading this, I was just as baffled as the protagonist. What on earth does any of this have to do with anything? I only figured it out after consulting Baidu. Her father meant: don&amp;rsquo;t obsess over irrelevant things. Which is to say, her father offered no explanation at all. But by the time of the rape trial, the young protagonist understood perfectly — she knew what &amp;ldquo;rape&amp;rdquo; meant. It&amp;rsquo;s hard to say whether this kind of evasive education is right or wrong.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Who Killed Bob Ewell?
 &lt;div id="who-killed-bob-ewell" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#who-killed-bob-ewell" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The final chapters are brilliantly rendered. While reading, I felt completely immersed in that pitch-black schoolyard night, that son-of-a-bitch Ewell (that&amp;rsquo;s how the sheriff refers to Ewell in the book — the first time I read that line, I silently cursed him too&amp;hellip;) hunting down two innocent children&amp;hellip; In the end, Ewell dies, but the full truth of what happened isn&amp;rsquo;t entirely clear. The narrative is told from the young protagonist&amp;rsquo;s first-person perspective, but she doesn&amp;rsquo;t see who killed Ewell.&lt;/p&gt;
&lt;p&gt;Since I read the e-book version, I could see many readers&amp;rsquo; annotations and comments. I found that many people completely missed the key details of the case. I was also utterly confused after my first read-through. I reread the relevant sections several times and finally pieced together the author&amp;rsquo;s intent and the full sequence of events. Let me unravel this mystery through several key questions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1) Is Boo Radley Black or white?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This question seems absurd but is critically important. If Radley were Black, then Tom Robinson&amp;rsquo;s case would be a cautionary tale — a Black man killing a white man is enough to be executed several times over. The jury wouldn&amp;rsquo;t care about the truth; the defendant being Black would be sufficient for a guilty verdict. So the old father and old sheriff&amp;rsquo;s desire to protect Radley would be perfectly natural — putting Radley through the legal process would just be throwing away a good man&amp;rsquo;s life. This would also make the novel a work primarily about Black racism.&lt;/p&gt;
&lt;p&gt;But Radley is white. So none of the above applies. This also brings the novel&amp;rsquo;s content more in line with its title. The author never directly states that Radley is white, but you can put it this way: if the author doesn&amp;rsquo;t specify someone is Black, then they&amp;rsquo;re white~. Of course, there are other clues: Radley ran around with Cunningham boys (white) as a kid; he lives in a white neighborhood; his skin is deathly pale&amp;hellip; Radley is a character described from the very beginning to the very end, the most richly drawn &amp;ldquo;mockingbird&amp;rdquo; of the entire book — yet we only see his true face in the final two chapters. That&amp;rsquo;s why I felt so unsettled not knowing whether he was white&amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2) The gap in the action&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here is the passage where the young protagonist is pinned down by Ewell and ultimately saved:&lt;/p&gt;
&lt;p&gt;&amp;ldquo;He was slowly choking me, and I couldn&amp;rsquo;t move at all. Suddenly, he was yanked hard from behind and fell to the ground with a thud, nearly dragging me down with him. I thought, Jem must have gotten up.&lt;/p&gt;
&lt;p&gt;Sometimes, human reactions are sluggish. I stood there dumbly, like a mute. The sounds of struggle slowly subsided. Someone was panting heavily. The night returned to its prior stillness.&lt;/p&gt;
&lt;p&gt;&amp;hellip;I slowly realized there were four people under the tree now.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;From the moment Ewell is pulled away to the moment there are four people under the tree, a struggle took place. Afterward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Jem (the protagonist&amp;rsquo;s older brother) lies on the ground, injured by Ewell, unconscious&lt;/li&gt;
&lt;li&gt;Ewell (the man who tried to kill children) lies dead with a kitchen knife in his ribs&lt;/li&gt;
&lt;li&gt;Radley (the man who came to save the children) leans against a tree, coughing&lt;/li&gt;
&lt;li&gt;The protagonist stands frozen, still in shock&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &amp;ldquo;gap&amp;rdquo; refers to: who pulled Ewell away and killed him? What exactly happened? The subsequent discussion between Atticus and the sheriff revolves around reconstructing this gap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3) The kitchen knife&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;First, the knife the sheriff uses for his demonstration is a switchblade, not the kitchen knife. &amp;ldquo;Was Ewell killed with this knife?&amp;rdquo; &amp;ldquo;No, that knife is still in him. From the handle, it&amp;rsquo;s a kitchen knife.&amp;rdquo; So the sheriff did not destroy the murder weapon — that&amp;rsquo;s a fact.&lt;/p&gt;
&lt;p&gt;In any homicide, the murder weapon is an extraordinarily critical piece of evidence. Clearly, this kitchen knife is the murder weapon. Whoever brought this kitchen knife is very likely the killer. The sheriff says, &amp;ldquo;Ewell probably found that kitchen knife somewhere in the dump&amp;hellip; sharpened it razor-sharp&amp;hellip; Ewell fell on his own knife.&amp;rdquo; This is the sheriff&amp;rsquo;s subjective speculation. There are many possibilities:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Ewell brought the knife, tripped himself, and the kitchen knife stabbed into his ribs — an accidental death.&lt;/li&gt;
&lt;li&gt;Ewell brought the knife; Jem, despite his broken arm, wrestled it away and killed him.&lt;/li&gt;
&lt;li&gt;Radley brought the knife and killed Ewell.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;First: the probability of Ewell bringing the knife is low. If he&amp;rsquo;d brought a knife, he could have just rushed up and stabbed them — there&amp;rsquo;d be no need to go to the trouble of twisting Jem&amp;rsquo;s arm and strangling the protagonist. Now:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Scenario 1: Ewell was already fighting someone (though it&amp;rsquo;s never explicitly stated with whom). An accidental death at this point seems far-fetched, but it can&amp;rsquo;t be entirely ruled out — though the probability is extremely low.&lt;/li&gt;
&lt;li&gt;Scenario 2: Broken-arm Jem wrestles the knife away and kills Ewell. This scenario is based on the protagonist saying &amp;ldquo;it felt like Jem pulled Ewell back&amp;rdquo; — so naturally it must have been Jem fighting Ewell. The protagonist didn&amp;rsquo;t see who pulled Ewell back; she only says it &amp;ldquo;felt like&amp;rdquo; him. A thirteen-year-old boy with a freshly broken arm taking a knife from an adult and killing him — also an extremely low probability.&lt;/li&gt;
&lt;li&gt;Scenario 3: Radley brought the knife, yanked Ewell back to stop him from strangling the protagonist, then stabbed Ewell to death. This is the most likely scenario — and precisely the scenario that Atticus and the sheriff &amp;ldquo;deliberately&amp;rdquo; avoid mentioning during their reconstruction. One detail supports this: before Ewell burst out, both children had screamed. The neighbors probably didn&amp;rsquo;t hear — but earlier in the book, it&amp;rsquo;s mentioned that the tree at the scene is very close to Radley&amp;rsquo;s house.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;4) The reconstruction&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Whether viewed through the novel&amp;rsquo;s themes and atmosphere, or through specific case analysis, it&amp;rsquo;s almost certain that the person who killed Ewell was Mr. Boo Radley.&lt;/p&gt;
&lt;p&gt;The reconstruction dialogue between Atticus and the sheriff — I reread it multiple times; it&amp;rsquo;s absolutely fascinating. It traces the entire process of reasoning through Ewell&amp;rsquo;s death and Atticus&amp;rsquo;s and the sheriff&amp;rsquo;s psychological shifts — yet throughout, the real killer is never once named.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Atticus wants to clarify the facts; the sheriff wants to protect the child.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;First, the protagonist says someone pulled Ewell away — she felt it was Jem. Based on this, Atticus deduces that Jem got up, pulled Ewell away, wrestled the knife from him, and killed him. Working from Atticus&amp;rsquo;s deduction that Jem is the killer, the sheriff wants to protect Jem and says, &amp;ldquo;Ewell fell dead on his own knife.&amp;rdquo; Atticus then says: &amp;ldquo;If we cover up the truth, that would go against everything I&amp;rsquo;ve ever taught Jem about how to be a person.&amp;rdquo; To convince Atticus, the sheriff even demonstrates the tripping scenario.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Gradually realizing the truth.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A thirteen-year-old boy with a broken arm is unlikely to fight and kill an adult in the dark. &amp;ldquo;Unless someone is very accustomed to the dark to qualify as a witness&amp;hellip;&amp;rdquo; — an unmistakable hint at Radley, who never leaves his house.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;The key piece of evidence — the kitchen knife.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Atticus &amp;ldquo;suddenly&amp;rdquo; asks about the knife. The knife is still in Ewell&amp;rsquo;s body. Both of them individually realize the knife is Radley&amp;rsquo;s. They need to smooth over the knife issue. The sheriff suggests maybe Ewell found it in the dump and sharpened it.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Confirming the lie.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;There are many slow-motion descriptions interwoven here. Both Atticus and the sheriff are silently checking whether this lie has any holes, whether they should accept it: Ewell&amp;rsquo;s death was an accident — he killed himself. In the end, they reach an agreement. Even the eight-year-old protagonist says, &amp;ldquo;I can understand.&amp;rdquo;&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;The title drop.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&amp;ldquo;This man has done a great service for you and for this entire town. If people ignored his reclusive habits and forced him into the spotlight — I think, that would be a crime.&amp;rdquo; This line directly hits the novel&amp;rsquo;s theme. A mockingbird symbolizes an innocent, harmless person. Dragging Radley — this mockingbird — into the spotlight is a crime. It echoes what Atticus said earlier: &amp;ldquo;Remember, it&amp;rsquo;s a sin to kill a mockingbird.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Final Thoughts
 &lt;div id="final-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#final-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;To Kill a Mockingbird&lt;/em&gt; is a relatively simple story, easy to understand — at least compared to some of the books I&amp;rsquo;ve read before. The stories of the mockingbirds in the book left a deep impression. It reminds me of the &amp;ldquo;Brother Long&amp;rdquo; self-defense case from a few years ago in China. Without Brother Long, it&amp;rsquo;s likely that killing someone in self-defense would still get you a prison sentence here. Think about America nearly a hundred years ago, when the law itself was still newly established&amp;hellip; Think about that Black man wrongly convicted, shot dead by prison guards while trying to escape. What kind of despair must he have felt in that prison cell? He just wanted to be a decent person — and his life was suddenly cut short.&lt;/p&gt;
&lt;p&gt;Let me close with a line from Atticus: &amp;ldquo;I think there&amp;rsquo;s just one kind of folks. Folks.&amp;rdquo;&lt;/p&gt;</content:encoded></item><item><title>Book Notes — When Breath Becomes Air &amp; What Life Should Mean to You</title><link>https://lastdba.com/en/2024/08/12/book-notes-when-breath-becomes-air-what-life-should-mean-to-you/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-when-breath-becomes-air-what-life-should-mean-to-you/</guid><description>&lt;h2 class="relative group"&gt;Why Write About Two Books Together?
 &lt;div id="why-write-about-two-books-together" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-write-about-two-books-together" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Normally I&amp;rsquo;d write separate pieces after finishing these two books, but I figured neither would yield all that much content. Although I&amp;rsquo;ve read a few English originals (and written about them), I clearly underestimated the difficulty of &lt;em&gt;When Breath Becomes Air&lt;/em&gt;. It&amp;rsquo;s packed with unfamiliar vocabulary — loads of medical terms I&amp;rsquo;d never encountered. I basically forced my way through it with half-understanding. As for &lt;em&gt;What Life Should Mean to You&lt;/em&gt;&amp;hellip; it doesn&amp;rsquo;t feel as miraculous as people say. After all, it&amp;rsquo;s a century old — I didn&amp;rsquo;t extract much nourishment from it (a little, though). To avoid the awkwardness of too-thin content, I&amp;rsquo;m lumping them together.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Why Write About Two Books Together?
 &lt;div id="why-write-about-two-books-together" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-write-about-two-books-together" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Normally I&amp;rsquo;d write separate pieces after finishing these two books, but I figured neither would yield all that much content. Although I&amp;rsquo;ve read a few English originals (and written about them), I clearly underestimated the difficulty of &lt;em&gt;When Breath Becomes Air&lt;/em&gt;. It&amp;rsquo;s packed with unfamiliar vocabulary — loads of medical terms I&amp;rsquo;d never encountered. I basically forced my way through it with half-understanding. As for &lt;em&gt;What Life Should Mean to You&lt;/em&gt;&amp;hellip; it doesn&amp;rsquo;t feel as miraculous as people say. After all, it&amp;rsquo;s a century old — I didn&amp;rsquo;t extract much nourishment from it (a little, though). To avoid the awkwardness of too-thin content, I&amp;rsquo;m lumping them together.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6f97f7438495.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;When Breath Becomes Air
 &lt;div id="when-breath-becomes-air" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#when-breath-becomes-air" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The author was a surgeon with extraordinary achievements in medicine. At the peak of his career, he learned he had terminal cancer. Less than two years after the diagnosis, he passed away. This book was written during those two years. It describes, from a first-person perspective, how one confronts such misfortune as cancer and reflects on life and its meaning in one&amp;rsquo;s final days.&lt;/p&gt;
&lt;p&gt;When he learned he had terminal cancer, as a top surgeon he knew exactly what it meant. He knew he didn&amp;rsquo;t have long to live. At first, he was even angry — why did such a low-probability event happen to me? Why me? Something like this is hard for anyone to accept. But only the truly ill live day by day with the pain, quietly walking toward that inevitable but unscheduled death.&lt;/p&gt;
&lt;p&gt;After his diagnosis, he and his wife decided to have a child immediately — before chemotherapy began. The author also managed to spend a few months with his baby daughter before passing. He wanted to watch his precious daughter grow up, to know what she&amp;rsquo;d be like when she was older — though he was certain he&amp;rsquo;d never know. It seems almost too cruel.&lt;/p&gt;
&lt;p&gt;Near the end of the book (about twenty or thirty pages from the finish), the author&amp;rsquo;s prose abruptly stops. What follows is a chapter written by his wife, opening with: &amp;ldquo;Paul has left us&amp;hellip;&amp;rdquo; Even knowing how it would end, I couldn&amp;rsquo;t accept it — death came so suddenly that he couldn&amp;rsquo;t even finish his book&amp;hellip; But thinking about it from the book&amp;rsquo;s intended meaning, this incompleteness is, in a way, a kind of completion&amp;hellip;&lt;/p&gt;
&lt;p&gt;How should we view death? If I were to die before forty, what would I do? I&amp;rsquo;d certainly be unwilling — there are too many things I haven&amp;rsquo;t finished. The author ultimately saw through the meaning of life; he believed the most important thing is to experience life and live in the present moment.
I seem to be different — I live in the future, never now! If I go die right now, I&amp;rsquo;d leave this world accompanied by anger and resentment.&lt;/p&gt;
&lt;p&gt;(His experience inevitably reminds me of the Japanese drama &lt;em&gt;The White Tower&lt;/em&gt; — an absolutely brilliant show! Professor Zaizen, at the peak of his career, gets cancer and ultimately donates his body for cancer pathology research&amp;hellip;)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0b8ac609261a.png" alt="在这里插入图片描述" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;What Life Should Mean to You (Beyond Inferiority)
 &lt;div id="what-life-should-mean-to-you-beyond-inferiority" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-life-should-mean-to-you-beyond-inferiority" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A famous work in psychology by Alfred Adler, founder of individual psychology. Long ago, I watched an episode of &lt;em&gt;Lao Gao and Xiao Mo&lt;/em&gt; about Adler and individual psychology — they made it sound almost miraculous. I couldn&amp;rsquo;t resist reading it, and figured I might even analyze myself a bit.&lt;/p&gt;
&lt;p&gt;The most important idea in individual psychology is: how we perceive traumatic experiences is the essence of psychological problems — not the experiences themselves causing them. But this doesn&amp;rsquo;t deny the influence of the &amp;ldquo;past&amp;rdquo; on people&amp;rsquo;s behavior.&lt;/p&gt;
&lt;p&gt;However, I personally found the book somewhat boring&amp;hellip; &amp;ldquo;a bit too humanities-oriented.&amp;rdquo; The essential differences between short chapters aren&amp;rsquo;t that significant — it&amp;rsquo;s just discussing individual psychology through different topics. I genuinely couldn&amp;rsquo;t extract substantial nourishment from it. Maybe because it was written a hundred years ago, or maybe I&amp;rsquo;m just not cut out for this.&lt;/p&gt;
&lt;p&gt;Adler also proposed: a group (or a couple) should think and act for the benefit of the collective, or else problems of separation will arise. If one person harbors self-serving thoughts, the group is bound to be unstable. I couldn&amp;rsquo;t agree more. I could write some self-analysis here, but I don&amp;rsquo;t want to expose myself — which is also why I felt this book note wouldn&amp;rsquo;t be very substantial.&lt;/p&gt;
&lt;p&gt;Before reading this book, I also sampled &lt;em&gt;The Courage to Be Disliked&lt;/em&gt; and &lt;em&gt;How to Win Friends and Influence People&lt;/em&gt;. Since both had higher ratings than the &amp;ldquo;founding father&amp;rdquo; Adler&amp;rsquo;s book, I checked them out to see what they were about — and I didn&amp;rsquo;t like either. &lt;em&gt;Courage&lt;/em&gt; is just a dialogue between two people — the classic wise-man-and-scholar format — where you learn the book&amp;rsquo;s ideas through conversation&amp;hellip; I gave up after a bit. Bestseller style. &lt;em&gt;How to Win Friends&lt;/em&gt; was more tolerable — it directly lays out life advice in plain terms. I read about ten pieces of advice — somewhat valuable — but I still couldn&amp;rsquo;t finish it. Bestseller style too.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Closing
 &lt;div id="closing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#closing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;d read a few English originals and clearly got a bit cocky — turns out I need to be realistic. Gauge the difficulty first before diving in.
I&amp;rsquo;d been wanting to read psychology for a while. After reading it, I&amp;rsquo;ve learned I&amp;rsquo;m not cut out for it.
Well, no matter what, I had to write this book note — recording my life, like Paul did.&lt;/p&gt;</content:encoded></item><item><title>Book Notes — Wild: From Lost to Found on the Pacific Crest Trail</title><link>https://lastdba.com/en/2024/08/12/book-notes-wild-from-lost-to-found-on-the-pacific-crest-trail/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/book-notes-wild-from-lost-to-found-on-the-pacific-crest-trail/</guid><description>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8de823138bee.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I came across this book because I saw it on an Obama-recommended reading list in an e-book app. This particular title felt special, and the ratings were good, so I decided to check it out. At first, reading the synopsis — a memoir by a hiking enthusiast — I assumed the book would just describe scenic views and the hardships of sleeping rough, probably not very &amp;ldquo;exciting.&amp;rdquo; But its writing has a distinctiveness all its own; it never feels boring. Once you start a short chapter, you simply can&amp;rsquo;t stop. By the end, when I saw only 10% of the pages remained, I actually felt a sense of imminent parting — a reluctance to say goodbye. This feeling of having discovered a treasure accompanied me throughout the entire reading.&lt;/p&gt;</description><content:encoded>&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8de823138bee.png" alt="Image description" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I came across this book because I saw it on an Obama-recommended reading list in an e-book app. This particular title felt special, and the ratings were good, so I decided to check it out. At first, reading the synopsis — a memoir by a hiking enthusiast — I assumed the book would just describe scenic views and the hardships of sleeping rough, probably not very &amp;ldquo;exciting.&amp;rdquo; But its writing has a distinctiveness all its own; it never feels boring. Once you start a short chapter, you simply can&amp;rsquo;t stop. By the end, when I saw only 10% of the pages remained, I actually felt a sense of imminent parting — a reluctance to say goodbye. This feeling of having discovered a treasure accompanied me throughout the entire reading.&lt;/p&gt;
&lt;p&gt;I used to be a devotee of physical books — I liked the sense of weight and substance, and the satisfaction of finishing a paper volume. Later, as I gradually came to embrace e-books, I discovered one advantage e-books have over physical ones: links. I found this book because a book mentioned in &lt;em&gt;Space Odyssey&lt;/em&gt; (I think) led me, via links, to &amp;ldquo;Obama&amp;rsquo;s recommendations,&amp;rdquo; and from all the Obama-recommended books I picked a few that interested me — one was &lt;em&gt;Wild&lt;/em&gt;. The protagonist of &lt;em&gt;Wild&lt;/em&gt; also loves to read, and she mentions several books; I bookmarked about five or six of them. So my originally barren reading list grew and flourished through this chain of links. These books are far, far better than those &amp;ldquo;Top Book Rankings&amp;rdquo; or &amp;ldquo;Essential Classics, Domestic and International.&amp;rdquo; Finishing a physical book easily leaves you wondering what to read next; e-books don&amp;rsquo;t have that problem.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Queen&amp;rsquo;s Journey
 &lt;div id="the-queens-journey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-queens-journey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After losing her mother, seeing her family fall apart, facing massive college debt, and descending into drug addiction, the author — perhaps wanting to rediscover herself — made &amp;ldquo;thorough&amp;rdquo; preparations and set out for the Pacific Crest Trail. For someone with zero hiking experience, the Pacific Crest Trail is the highest difficulty level. She called her overloaded backpack &amp;ldquo;the Monster&amp;rdquo; — so heavy she couldn&amp;rsquo;t even put it on properly. And just like that, this outdoor novice set off. Completing the entire trail takes four to six months. Along the way, you need to plan resupply points in advance, mailing food and essential supplies ahead to those locations. Once you reach a resupply point and restock, you return to the trail and press onward. The suffering on the journey — though hard to feel vicariously — you can sense how severe it was. The author alone had six toenails removed. This kind of agony, along with various unexpected incidents, is beyond what the average person can endure. That&amp;rsquo;s why the &amp;ldquo;failure rate&amp;rdquo; for people attempting this trail is very high. You need an exceptionally robust physique, a thorough plan, and some luck.&lt;/p&gt;
&lt;p&gt;Even with the dangers of wild animals, venomous snakes, scorching sun, glaciers, dehydration, and injuries in the wilderness, none compare to the danger of &amp;ldquo;people&amp;rdquo; — especially for a solo woman in her twenties. Once you experience the potential threat posed by humans, nature&amp;rsquo;s objective dangers almost feel like a relief. This reminds me of the plot of the HBO series &lt;em&gt;The Last of Us&lt;/em&gt;, which I watched recently: in a post-apocalyptic world, encountering zombies isn&amp;rsquo;t the scariest thing — encountering humans is.&lt;/p&gt;
&lt;p&gt;The author seems somewhat lascivious (at least by sexually conservative standards) — or maybe all Americans are this open about it. Before hitting the trail, she&amp;rsquo;d have one-night stands with many men, thoroughly enjoying the feeling, unapologetically describing this physical need and the sense of conquest in capturing a man.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;What I craved wasn&amp;rsquo;t someone to love, but just someone to press their body against mine.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;On the trail, she also fantasized about attractive men she encountered and secretly watched men undress. She also packed a lot of condoms in her backpack — sadly, not a single one was used by the end. Of course, I don&amp;rsquo;t mean she was only uninhibited about physical desires; she was just as emotionally and sentimentally passionate. No right or wrong — she simply expressed exactly how she felt in the moment. I really admire this kind of honest writing.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Pacific Crest Trail
 &lt;div id="the-pacific-crest-trail" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pacific-crest-trail" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5e5f1fbbc068.png" alt="PCT Map" /&gt;&lt;/p&gt;
&lt;p&gt;The Pacific Crest Trail is one of the world&amp;rsquo;s famous long-distance trails, located in the mountain ranges of the western United States — a range jokingly called &amp;ldquo;America&amp;rsquo;s Dragon Vein&amp;rdquo;&amp;hellip; Trail information is very easy to find and extremely well-documented. The author also relied on trail guidebooks for preparation and handling unexpected situations. The trail stretches 4,000 kilometers, spanning the contiguous United States from the Canadian border to the Mexican border, passing through Washington, Oregon, and California. It is one of the &lt;a href="https://www.pcta.org/our-work/national-trails-system/" target="_blank" rel="noreferrer"&gt;National Scenic Trails&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2bc7e1e56723.png" alt="National Trails Map" /&gt;&lt;/p&gt;
&lt;p&gt;I have essentially zero contact with hiking — my concept of it is still nil — so I can only be an armchair traveler envying these backpackers. After a bit of searching on outdoor hiking, I found there&amp;rsquo;s a tremendous amount to learn. Outdoor hiking not only offers spectacular scenery but apparently even has therapeutic functions — I easily found hiking psychotherapy associations just by searching. Let me quote a passage from the original about the spiritual world on the trail:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Now, I was wholly immersed in this world, living in a completely new way. Living so rootlessly, without even a roof over my head for shelter from wind and rain, made the world both much larger and much smaller.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;The Golden Touches
 &lt;div id="the-golden-touches" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-golden-touches" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;While reading this book, I kept thinking of &lt;em&gt;Educated: A Memoir&lt;/em&gt;. Both are memoirs describing a period of the authors&amp;rsquo; pasts. Not only are their writing styles similar, but their upbringings are too — growing up in isolated mountains, having an abusive father, a backward family life, and unexpectedly being extremely good at studying and getting into a top university.&lt;/p&gt;
&lt;p&gt;More importantly, their writing style easily captures the reader&amp;rsquo;s emotions without ever feeling boring or stifling. I haven&amp;rsquo;t managed to fully summarize how they write, but one thing I paid special attention to: the accumulation of emotion followed by the unexpected move.&lt;/p&gt;
&lt;p&gt;For example: when the author scatters her mother&amp;rsquo;s ashes into the earth, she keeps a few larger fragments of bone, unable to let go. Finally, she puts these unburned bone fragments into her mouth and swallows them.&lt;/p&gt;
&lt;p&gt;I was stunned reading this. Throughout the book, she describes her feelings for her mother in many places. Her mother&amp;rsquo;s death affected her profoundly. After flatly (or perhaps despairingly) describing her mother&amp;rsquo;s death and cremation, unable to let go, she chooses to swallow her mother&amp;rsquo;s bones into her stomach — so she can become one with her mother! What kind of emotion could drive such an act — one that most people would find impossible to accept — as a vessel for such heavy feeling? This swallowing motion conveys far more powerfully than endlessly repeated expressions of longing ever could, and it grabs the reader&amp;rsquo;s attention far more effectively.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s also a passage about condoms. An older backpacker, seeing how much stuff she&amp;rsquo;s carrying, helps her sort through her pack, throwing out things that are completely useless. The old backpacker finds a big packet of condoms: &amp;ldquo;Are you sure you need these?&amp;rdquo; Having gained some trail experience, she knows the stuff is utterly useless — but as she throws out the big pack, she secretly keeps one~ Then, the next morning when she wakes up, that one condom is gone&amp;hellip;&lt;/p&gt;
&lt;p&gt;These plot points are so dramatized I almost suspected they were fabricated. But I carefully read the author&amp;rsquo;s preface — she says she merely omitted certain scenes and guarantees that the events are all true.&lt;/p&gt;
&lt;p&gt;Regardless, a touch of plot that slightly exceeds realistic logic is essential in writing — it grabs the reader&amp;rsquo;s heart. The authenticity of these &amp;ldquo;golden touches&amp;rdquo; themselves isn&amp;rsquo;t important; what matters is having that touch. Let me give an example from one of my favorite films, &lt;em&gt;Memories of Murder&lt;/em&gt;, which I&amp;rsquo;m sure many have seen. Years later, the old detective returns to the crime scene and meets a child. The child says someone else was just here, crouching and staring at this drainage ditch just like you. The old detective immediately realizes this person could be the murderer. He asks the child what the person looked like. The child says: &amp;ldquo;Just&amp;hellip; ordinary.&amp;rdquo; This moment is a stroke of genius. Many viewers obsess over who the killer actually is, but it doesn&amp;rsquo;t matter who it is. &amp;ldquo;The murderer is ordinary&amp;rdquo; — that&amp;rsquo;s what the film is trying to say.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Final Thoughts
 &lt;div id="final-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#final-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;What could have been a boring story was written into a captivating work, with genuine depth, rich and authentic emotion. It&amp;rsquo;s a memoir of following the author back to nature and rediscovering the self — absolutely worth reading!&lt;/p&gt;
&lt;p&gt;Recently, good books have been streaming in nonstop; my bookshelf is quite packed. But I&amp;rsquo;m not worried about them gathering dust at all, because I believe the quality of these books matches this one — reaching that &amp;ldquo;can&amp;rsquo;t-put-it-down&amp;rdquo; level, requiring no self-discipline to become completely immersed. Let me quote from a hiking expert&amp;rsquo;s blog:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&amp;gt; What was your favorite stretch of scenery?
&amp;gt; The next one.&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>Case Study: Analyzing Occasional Slow INSERT VALUES</title><link>https://lastdba.com/en/2024/08/12/case-study-analyzing-occasional-slow-insert-values/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/case-study-analyzing-occasional-slow-insert-values/</guid><description>&lt;p&gt;The business team reported that INSERT VALUES occasionally became slow. By the time I checked the active sessions, the slow write problem had already subsided.&lt;/p&gt;
&lt;p&gt;Later, I discovered that the slow write problem lasted less than half a minute, with INSERT VALUES taking 1-2 seconds. I wrote a script to capture active session information and managed to get the session data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DataFileRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; BgWriterMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AutoVacuumMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;385&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LogicalLauncherMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The most abnormal wait event was WALWrite with 40 sessions.&lt;/p&gt;</description><content:encoded>&lt;p&gt;The business team reported that INSERT VALUES occasionally became slow. By the time I checked the active sessions, the slow write problem had already subsided.&lt;/p&gt;
&lt;p&gt;Later, I discovered that the slow write problem lasted less than half a minute, with INSERT VALUES taking 1-2 seconds. I wrote a script to capture active session information and managed to get the session data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DataFileRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; BgWriterMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AutoVacuumMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ClientRead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;385&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; LogicalLauncherMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The most abnormal wait event was WALWrite with 40 sessions.&lt;/p&gt;
&lt;p&gt;Two of the WALWrite-waiting sessions looked like this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partofquery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------------------------------+-------------------------------+---------------+-----------------+--------+--------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;144955&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;516574&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;516588&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1( 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;179869&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;116371&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;116386&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1( &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s search the source code for WALWrite-related content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; WALWriteLock: must be held to write WAL buffers to &lt;span style="color:#a6e22e"&gt;disk&lt;/span&gt; (XLogWrite or
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; XLogFlush).&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * LWLockAcquireOrWait - Acquire lock, or wait until it&amp;#39;s free
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * The semantics of this function are a bit funky. If the lock is currently
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * free, it is acquired in the given mode, and the function returns true. If
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * the lock isn&amp;#39;t immediately free, the function waits until it is released
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * and returns false, but does not acquire the lock.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This is currently used for WALWriteLock: when a backend flushes the WAL,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * holding WALWriteLock, it can flush the commit records of many other
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * backends as a side-effect. Those other backends need to wait until the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * flush finishes, but don&amp;#39;t need to acquire the lock anymore. They can just
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * wake up, observe that their records have already been flushed, and return.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When WAL is written from WAL buffers to disk, the WALWriteLock must be held.&lt;/p&gt;
&lt;p&gt;When a backend flushes WAL while holding WALWriteLock, it can also flush the commit records of other backends. Those other backends need to wait for this flush to finish, but they don&amp;rsquo;t need to acquire the lock afterward. If their WAL has been flushed, they can return directly (rather than flushing WAL again).&lt;/p&gt;
&lt;p&gt;&lt;code&gt;XLogFlush&lt;/code&gt; is extremely important. The key code in &lt;code&gt;XLogFlush&lt;/code&gt; is in the for loop:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Ensure that all XLOG data through the given position is flushed to disk.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * NOTE: this differs from XLogWrite mainly in that the WALWriteLock is not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * already held, and we try to avoid acquiring it if possible.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;XLogFlush&lt;/span&gt;(XLogRecPtr record)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Now wait until we get the write lock, or someone else does the flush
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * for us.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (;;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		XLogRecPtr	insertpos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* read LogwrtResult and update local state */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;info_lck);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (WriteRqstPtr &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtRqst.Write)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			WriteRqstPtr &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtRqst.Write;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		LogwrtResult &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtResult;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;SpinLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;info_lck);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* done already? */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (record &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; LogwrtResult.Flush)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Before actually performing the write, wait for all in-flight
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * insertions to the pages we&amp;#39;re about to write to finish.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		insertpos &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;WaitXLogInsertionsToFinish&lt;/span&gt;(WriteRqstPtr);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Try to get the write lock. If we can&amp;#39;t get it immediately, wait
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * until it&amp;#39;s released, and recheck if we still need to do the flush
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * or if the backend that held the lock did it for us already. This
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * helps to maintain a good rate of group committing when the system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * is bottlenecked by the speed of fsyncing.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;LWLockAcquireOrWait&lt;/span&gt;(WALWriteLock, LW_EXCLUSIVE))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * The lock is now free, but we didn&amp;#39;t acquire it yet. Before we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * do, loop back to check if someone else flushed the record for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * us already.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;continue&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Got the lock; recheck whether request is satisfied */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		LogwrtResult &lt;span style="color:#f92672"&gt;=&lt;/span&gt; XLogCtl&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;LogwrtResult;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (record &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; LogwrtResult.Flush)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(WALWriteLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Sleep before flush! By adding a delay here, we may give further
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * backends the opportunity to join the backlog of group commit
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * followers; this can significantly improve transaction throughput,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * at the risk of increasing transaction latency.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * We do not sleep if enableFsync is not turned on, nor if there are
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * fewer than CommitSiblings other backends with active transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (CommitDelay &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt; enableFsync &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;MinimumActiveBackends&lt;/span&gt;(CommitSiblings))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;pg_usleep&lt;/span&gt;(CommitDelay);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Re-check how far we can now flush the WAL. It&amp;#39;s generally not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * safe to call WaitXLogInsertionsToFinish while holding
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * WALWriteLock, because an in-progress insertion might need to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * also grab WALWriteLock to make progress. But we know that all
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * the insertions up to insertpos have already finished, because
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * that&amp;#39;s what the earlier WaitXLogInsertionsToFinish() returned.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * We&amp;#39;re only calling it again to allow insertpos to be moved
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * further forward, not to actually wait for anyone.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			insertpos &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;WaitXLogInsertionsToFinish&lt;/span&gt;(insertpos);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* try to write/flush later additions to XLOG as well */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		WriteRqst.Write &lt;span style="color:#f92672"&gt;=&lt;/span&gt; insertpos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		WriteRqst.Flush &lt;span style="color:#f92672"&gt;=&lt;/span&gt; insertpos;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;XLogWrite&lt;/span&gt;(WriteRqst, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(WALWriteLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* done */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;XLogFlush&lt;/code&gt; function is the main function for flushing dirty WAL:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Check if the dirty WAL that needs to be flushed has already been flushed by someone else. If so, return directly.&lt;/li&gt;
&lt;li&gt;Try to acquire the lock &lt;code&gt;WALWriteLock&lt;/code&gt; in exclusive mode, retrying continuously until the lock is acquired.&lt;/li&gt;
&lt;li&gt;Once the lock is acquired, check again if the dirty WAL that needs to be flushed has already been flushed by someone else. If so, release &lt;code&gt;WALWriteLock&lt;/code&gt; and return (during the lock acquisition wait, someone else might have flushed the dirty WAL — if so, there&amp;rsquo;s nothing to do).&lt;/li&gt;
&lt;li&gt;Wait for &lt;code&gt;commit_delay&lt;/code&gt; milliseconds, and if the number of concurrent committing transactions exceeds &lt;code&gt;commit_siblings&lt;/code&gt;, update the WAL write position to form a group commit. This step currently doesn&amp;rsquo;t apply because &lt;code&gt;CommitDelay&lt;/code&gt; defaults to 0, effectively meaning group commit is not enabled.&lt;/li&gt;
&lt;li&gt;Call &lt;code&gt;XLogWrite&lt;/code&gt; to write the log, release &lt;code&gt;WALWriteLock&lt;/code&gt; after completion.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;XLogFlush&lt;/code&gt; for flushing dirty WAL needs to check whether the currently requested dirty WAL has already been written. If not, it will hold &lt;code&gt;WALWriteLock&lt;/code&gt; until the &lt;code&gt;XLogWrite&lt;/code&gt; function completes writing the log. &lt;code&gt;XLogWrite&lt;/code&gt; is the specific function for writing WAL, such as writing to which position on which page.&lt;/p&gt;
&lt;p&gt;Returning to the wait events from active sessions, the &lt;code&gt;IO:WALWrite&lt;/code&gt; wait is relatively easy to understand, but how do we confirm whether &lt;code&gt;LWLock:WALWrite&lt;/code&gt; is a problem?&lt;/p&gt;
&lt;p&gt;From the &lt;code&gt;XLogFlush&lt;/code&gt; function logic, we know that &lt;code&gt;WALWriteLock&lt;/code&gt; is an exclusive LWLock that PostgreSQL acquires when writing dirty WAL (this makes sense — WAL commit information is written sequentially and can only be written in exclusive mode; you can&amp;rsquo;t let whoever writes fastest write first, as that could easily corrupt data). It&amp;rsquo;s a serialized write of WAL commit information.&lt;/p&gt;
&lt;p&gt;Understanding this part of the logic, looking back at &lt;code&gt;pg_stat_activity&lt;/code&gt;, we can see that there was &lt;strong&gt;only 1&lt;/strong&gt; &lt;code&gt;IO:WALWrite&lt;/code&gt;, while there were dozens of &lt;code&gt;LWLock:WALWrite&lt;/code&gt; waits.&lt;/p&gt;
&lt;p&gt;Although we can&amp;rsquo;t directly see the LWLock blocking chain, we can infer from the source code that &lt;strong&gt;LWLock:WALWrite is waiting on IO:WALWrite&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://www.postgresql.org/docs/16/wal-configuration.html" target="_blank" rel="noreferrer"&gt;official documentation&lt;/a&gt; has a section about &lt;code&gt;XLogFlush&lt;/code&gt; and adjusting WAL buffers:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Normally, WAL buffers should be written and flushed by an XLogFlush request, which is made, for the most part, at transaction commit time to ensure that transaction records are flushed to permanent storage. On systems with high WAL output, XLogFlush requests might not occur often enough to prevent XLogInsertRecord from having to do writes. On such systems one should increase the number of WAL buffers by modifying the wal_buffers parameter. When full_page_writes is set and the system is very busy, setting wal_buffers higher will help smooth response times during the period immediately following each checkpoint.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Under normal circumstances, WAL buffers are flushed by &lt;code&gt;XLogFlush&lt;/code&gt;, for example during transaction commit to write WAL logs to disk. If the WAL log volume is large but &lt;code&gt;XLogFlush&lt;/code&gt; is not triggered frequently enough (meaning mostly large transactions), &lt;code&gt;XLogInsertRecord&lt;/code&gt; needs to write uncommitted WAL records — meaning the WAL buffer is insufficient. In this case, increasing &lt;code&gt;wal_buffers&lt;/code&gt; may slightly help with system response time.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;There are two commonly used internal WAL functions: XLogInsertRecord and XLogFlush. XLogInsertRecord is used to place a new record into the WAL buffers in shared memory. If there is no space for the new record, XLogInsertRecord will have to write (move to kernel cache) a few filled WAL buffers&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Combined with a description from the &lt;code&gt;XLogInsertRecord&lt;/code&gt; function:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; We have now done all the preparatory work we can without holding a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; lock or modifying shared state. From here on, inserting the new WAL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; record to the shared WAL buffer cache is a two&lt;span style="color:#f92672"&gt;-&lt;/span&gt;step process:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1.&lt;/span&gt; Reserve the right amount of space from the WAL. The current head of
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 reserved space is kept in Insert&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;CurrBytePos, and is protected by
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 insertpos_lck.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2.&lt;/span&gt; Copy the record to the reserved WAL space. This involves finding the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 correct WAL buffer containing the reserved space, and copying the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	 &lt;span style="color:#f92672"&gt;*&lt;/span&gt;	 record in place. This can be done concurrently in multiple processes.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;XLogInsertRecord&lt;/code&gt; function is used to place new WAL records into the WAL buffer:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Writing requires reserving a certain amount of space.&lt;/li&gt;
&lt;li&gt;Copy the WAL record to the reserved WAL space (presumably the reserved space in the WAL buffer). &lt;strong&gt;Multiple processes can execute this in parallel.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Copying WAL records to the WAL buffer can be done in parallel. This is unlikely to be a bottleneck since it&amp;rsquo;s an in-memory copy with parallelism.&lt;/p&gt;
&lt;p&gt;But &lt;code&gt;XLogFlush&lt;/code&gt; is different — it holds an exclusive LWLock throughout the write. So, in scenarios with high concurrency and small transactions, increasing WAL buffers theoretically won&amp;rsquo;t be very effective.&lt;/p&gt;
&lt;p&gt;At this point, we can rule out &lt;code&gt;wal_buffers&lt;/code&gt; memory tuning and focus our attention on I/O. Looking at the I/O-related wait counts in &lt;code&gt;pg_stat_activity&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DataFileRead	&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DataFileExtend	&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WALWrite		&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WALRead			&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The INSERT VALUES slowness lasted less than a minute and was not normally present. However, looking at the normal session information, I/O class WALWrite waits were almost always there:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partofquery
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------------------------------+-------------------------------+---------------+-----------------+--------+--------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;72668&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828394&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;82841&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1( &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;77215&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;342541&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;342552&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;94904&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;442309&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;442323&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;88024&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;779086&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;779311&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; table2 &lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;103236&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;144283&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;144302&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;47342&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;192683&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;192699&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;75399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;743023&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;743024&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; table1 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;221993&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzluser11 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;184532&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;184541&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WALWrite &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; table1 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;However, checking the I/O performance at that time, writing 15 MB/s was not high — in fact, it was relatively low compared to other time periods, and &lt;code&gt;w_await&lt;/code&gt; was also very low:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Device: rrqm&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s wrqm&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s r&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s w&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s rkB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s wkB&lt;span style="color:#f92672"&gt;/&lt;/span&gt;s avgrq&lt;span style="color:#f92672"&gt;-&lt;/span&gt;sz avgqu&lt;span style="color:#f92672"&gt;-&lt;/span&gt;sz await r_await w_await svctm &lt;span style="color:#f92672"&gt;%&lt;/span&gt;util
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;dm&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;322&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;187.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1515.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3572.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15344.00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22.23&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2.05&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1.20&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9.39&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.18&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0.15&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25.70&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There was no strong evidence pointing to a storage performance issue.&lt;/p&gt;
&lt;p&gt;At present, it appears to be transient lock contention during concurrent INSERT VALUES small transactions when flushing WAL. We can rule out the following options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Concurrent small transactions — no need to &lt;a href="https://www.postgresql.org/docs/16/wal-configuration.html" target="_blank" rel="noreferrer"&gt;adjust WAL buffers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;WAL log volume is not large — no need to enable &lt;a href="https://dba.stackexchange.com/questions/338319/postgres-walwrite-waits-whats-the-bottleneck" target="_blank" rel="noreferrer"&gt;log compression&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Not many FPIs (Full Page Images) — no need to adjust checkpoint&lt;/li&gt;
&lt;li&gt;I/O pressure is not high — no need to &lt;a href="https://docs.dbmarlin.com/docs/kb/wait-events/postgresql/walwritelock/" target="_blank" rel="noreferrer"&gt;improve I/O performance&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;At minimum, the following optimizations can be made:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Enable database group commit (can be deferred if concerned about risk; testing required)&lt;/li&gt;
&lt;li&gt;Batch multiple INSERT VALUES statements at the application level to reduce WALWriteLock contention&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>Case Study: Logical Replication Deadlocks Checkpoint, Walsender, and Backup</title><link>https://lastdba.com/en/2024/08/12/case-study-logical-replication-deadlocks-checkpoint-walsender-and-backup/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/case-study-logical-replication-deadlocks-checkpoint-walsender-and-backup/</guid><description>&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The backup process (&lt;code&gt;pg_start_backup()&lt;/code&gt;) was blocked by the checkpointer, and the checkpointer was blocked by the logical replication walsender. The database was still serving queries, but backup, checkpoint, and logical replication were all completely hung.&lt;/p&gt;
&lt;p&gt;Two processes in &lt;code&gt;pg_stat_activity&lt;/code&gt; showed an unusual wait event: &lt;code&gt;replication_slot_io&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17630&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;35157&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; repuser
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PostgreSQL JDBC Driver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37623&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75022&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;764475&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;658&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;343116&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; checkpointer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;One walsender and one checkpointer. Both were started on April 2. Let&amp;rsquo;s check the walsender 173038 logs:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Symptoms
 &lt;div id="problem-symptoms" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-symptoms" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The backup process (&lt;code&gt;pg_start_backup()&lt;/code&gt;) was blocked by the checkpointer, and the checkpointer was blocked by the logical replication walsender. The database was still serving queries, but backup, checkpoint, and logical replication were all completely hung.&lt;/p&gt;
&lt;p&gt;Two processes in &lt;code&gt;pg_stat_activity&lt;/code&gt; showed an unusual wait event: &lt;code&gt;replication_slot_io&lt;/code&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17630&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;35157&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; repuser
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PostgreSQL JDBC Driver
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;37623&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75022&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;764475&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;658&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;hostlzl:&lt;span style="color:#ae81ff"&gt;6666&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;postgres][&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;]&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;----+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;343116&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;backend_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; checkpointer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;One walsender and one checkpointer. Both were started on April 2. Let&amp;rsquo;s check the walsender 173038 logs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--repuser log
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.750 CST,,,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,1,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;connection received: host=30.88.75.58 port=37623&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.756 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,2,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,32/30,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;replication connection authorized: user=repuser&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.765 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,3,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,32/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;starting logical decoding for slot &amp;#34;&amp;#34;pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;Streaming transactions committing after 4263/42E6EF88, reading WAL from 4263/41DAEBD0.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:40:07.765 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,173038,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:37623&amp;#34;&lt;/span&gt;,660b7e17.2a3ee,4,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:40:07 CST,32/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;logical decoding found consistent point at 4263/41DAEBD0&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;There are no running transactions.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Walsender 173038 only shows startup information. After that, no more log output — it likely hung from the very start.&lt;/p&gt;
&lt;p&gt;Scrolling back a bit, we can find an earlier walsender for the same replication slot (different PID, same slot name):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--84918 earlier startup logs
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.498 CST,,,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,1,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;connection received: host=30.88.75.58 port=54898&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.504 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,2,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,30/3,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;replication connection authorized: user=repuser&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.514 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,3,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,30/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;starting logical decoding for slot &amp;#34;&amp;#34;pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;Streaming transactions committing after 4263/41DADE38, reading WAL from 4263/358F1340.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:30:07.516 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,84918,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:54898&amp;#34;&lt;/span&gt;,660b7bbf.14bb6,4,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:07 CST,30/0,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;logical decoding found consistent point at 4263/358F1340&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;There are no running transactions.&amp;#34;&lt;/span&gt;,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:07.061 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,86630,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:45227&amp;#34;&lt;/span&gt;,660b7bca.15266,5,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:18 CST,30/0,0,ERROR,XX000,&lt;span style="color:#e6db74"&gt;&amp;#34;could not write to file &amp;#34;&amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;#34;&amp;#34;: Cannot allocate memory&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:40.151 CST,&lt;span style="color:#e6db74"&gt;&amp;#34;repuser&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,86630,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.75.58:45227&amp;#34;&lt;/span&gt;,660b7bca.15266,6,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,2024-04-02 11:30:18 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;disconnection: session time: 0:06:21.760 user=repuser database=lzldb host=30.88.75.58 port=45227&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;PostgreSQL JDBC Driver&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This replication slot was also started at 11:30:07. Six minutes later, it failed to write &lt;code&gt;state.tmp&lt;/code&gt; due to memory exhaustion.&lt;/p&gt;
&lt;p&gt;The checkpointer process 12729 also reported the same &lt;code&gt;state.tmp&lt;/code&gt; error — &lt;code&gt;&amp;quot;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;quot;&amp;quot;: File exists&amp;quot;&lt;/code&gt;. This error appeared ~30 seconds after the replication slot error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--checkpoint log
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:39.925 CST,,,12729,,660b7a17.31b9,4,,2024-04-02 11:23:03 CST,,0,LOG,58P02,&lt;span style="color:#e6db74"&gt;&amp;#34;could not create file &amp;#34;&amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;#34;&amp;#34;: File exists&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:36:40.151 CST,,,12729,,660b7a17.31b9,5,,2024-04-02 11:23:03 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint complete: wrote 334 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.108 s, sync=0.082 s, total=217.083 s; sync files=139, longest=0.004 s, average=0.000 s; distance=2295 kB, estimate=2295 kB&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;2024-04-02 11:48:03.414 CST,,,12729,,660b7a17.31b9,6,,2024-04-02 11:23:03 CST,,0,LOG,00000,&lt;span style="color:#e6db74"&gt;&amp;#34;checkpoint starting: time&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After this, the checkpointer produced no more log output — it hung, just like the walsender.&lt;/p&gt;
&lt;p&gt;Searching for &lt;code&gt;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb/state.tmp&amp;quot;&amp;quot;: File exists&amp;quot;&lt;/code&gt; quickly leads to a community thread: &lt;a href="https://www.postgresql.org/message-id/14b3454f-2d68-c637-68e4-2b42ff976168%40postgrespro.ru" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/message-id/14b3454f-2d68-c637-68e4-2b42ff976168%40postgrespro.ru&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;The actual fix landed in &lt;a href="https://www.postgresql.org/docs/release/12.3/" target="_blank" rel="noreferrer"&gt;PG 12.3&lt;/a&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Ensure that a replication slot&amp;rsquo;s io_in_progress_lock is released in failure code paths (Pavan Deolasee)
This could result in a walsender later becoming stuck waiting for the lock.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 class="relative group"&gt;Deep Dive
 &lt;div id="deep-dive" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#deep-dive" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;We found the bug, but several questions remain:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Why did the walsender and checkpointer hang?&lt;/li&gt;
&lt;li&gt;Who is blocking whom — the walsender or the checkpointer?&lt;/li&gt;
&lt;li&gt;How was this triggered?&lt;/li&gt;
&lt;li&gt;What are the solutions?&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Current version: 11.5.&lt;/p&gt;
&lt;p&gt;Pstack of both processes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@hostlzl:lzldb:6666: /pg/pg6666/data/pg_log&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pstack &lt;span style="color:#ae81ff"&gt;173038&lt;/span&gt; &lt;span style="color:#75715e"&gt;##walsender&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00002b9eec171a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002b9eec171a9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002b9eec171b3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00000000006b2512 in PGSemaphoreLock (sema=0x2b9ef5fdb0b8) at pg_sema.c:316&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000071e94c in LWLockAcquire (lock=lock@entry=0x2babd8cee5b8, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1243&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x00000000006ef7cb in SaveSlotToPath (slot=0x2babd8cee500, dir=dir@entry=0x7ffcaffd79f0 &amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;, elevel=elevel@entry=20) at slot.c:1249&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x00000000006f0515 in ReplicationSlotSave () at slot.c:653&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x00000000006d75d8 in LogicalConfirmReceivedLocation (lsn=&amp;lt;optimized out&amp;gt;) at logical.c:1049&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 0x00000000006d774d in LogicalIncreaseXminForSlot (current_lsn=current_lsn@entry=72994075200640, xmin=xmin@entry=1241611955) at logical.c:914&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000006e0fb3 in SnapBuildProcessRunningXacts (builder=builder@entry=0x23146c0, lsn=72994075200640, running=running@entry=0x22e8978) at snapbuild.c:1146&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x00000000006d484c in DecodeStandbyOp (buf=0x7ffcaffd7eb0, buf=0x7ffcaffd7eb0, ctx=0x2209820) at decode.c:318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 LogicalDecodingProcessRecord (ctx=0x2209820, record=&amp;lt;optimized out&amp;gt;) at decode.c:121&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x00000000006e50e0 in XLogSendLogical () at walsender.c:2799&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 0x00000000006e7122 in WalSndLoop (send_data=send_data@entry=0x6e5080 &amp;lt;XLogSendLogical&amp;gt;) at walsender.c:2162&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x00000000006e7d91 in StartLogicalReplication (cmd=0x22eedd8) at walsender.c:1109&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 exec_replication_command (cmd_string=cmd_string@entry=0x2210c48 &amp;#34;START_REPLICATION SLOT pg_lzldb_lzldb_ora_pgdb_pgdb LOGICAL 4263/42E6EF88 (\&amp;#34;add-tables\&amp;#34; &amp;#39;public.acr_finance_coa_partition_17_01,public.acr_finance_coa_partition_17_02,public.acr_finance_coa_part&amp;#34;...) at walsender.c:1541&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 0x000000000072c899 in PostgresMain (argc=&amp;lt;optimized out&amp;gt;, argv=argv@entry=0x2216f78, dbname=0x2216c98 &amp;#34;lzldb&amp;#34;, username=&amp;lt;optimized out&amp;gt;) at postgres.c:4178&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 0x000000000047e481 in BackendRun (port=0x20eda0) at postmaster.c:4358&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#18 BackendStartup (port=0x20eda0) at postmaster.c:4030&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#19 ServerLoop () at postmaster.c:1707&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#20 0x00000000006c4359 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x21dbe90) at postmaster.c:1380&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#21 0x000000000047eefb in main (argc=3, argv=0x21dbe90) at main.c:228&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;[&lt;/span&gt;postgres@hostlzl:lzldb:6666: /pg/pg6666/data/pg_wal&lt;span style="color:#f92672"&gt;]&lt;/span&gt;$ pstack &lt;span style="color:#ae81ff"&gt;12729&lt;/span&gt; &lt;span style="color:#75715e"&gt;##checkpointer&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00002b9eec171a0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x00002b9eec171a9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00002b9eec171b3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 0x00000000006b2512 in PGSemaphoreLock (sema=0x2b9ef5fdcd38) at pg_sema.c:316&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x000000000071e94c in LWLockAcquire (lock=lock@entry=0x2babd8cee5b8, mode=mode@entry=LW_EXCLUSIVE) at lwlock.c:1243&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x00000000006ef7cb in SaveSlotToPath (slot=slot@entry=0x2babd8cee500, dir=dir@entry=0x7ffcaffd6ee0 &amp;#34;pg_replslot/pg_lzldb_lzldb_ora_pgdb_pgdb&amp;#34;, elevel=elevel@entry=15) at slot.c:1249&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#6 0x00000000006f11a7 in CheckPointReplicationSlots () at slot.c:1100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#7 0x00000000004f674f in CheckPointGuts (checkPointRedo=72994093982360, flags=flags@entry=128) at xlog.c:9146&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#8 0x00000000004fcc77 in CreateCheckPoint (flags=flags@entry=128) at xlog.c:8937&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#9 0x00000000006b8312 in CheckpointerMain () at checkpointer.c:491&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#10 0x000000000050ba15 in AuxiliaryProcessMain (argc=argc@entry=2, argv=argv@entry=0x7ffcaffd7540) at bootstrap.c:451&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#11 0x00000000006c1cb9 in StartChildProcess (type=CheckpointerProcess) at postmaster.c:5337&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#12 0x00000000006c2f5a in reaper (postgres_signal_arg=&amp;lt;optimized out&amp;gt;) at postmaster.c:2867&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#13 &amp;lt;signal handler called&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#14 0x00002b9eed5ba783 in __select_nocancel () from /lib64/libc.so.6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#15 0x000000000047db38 in ServerLoop () at postmaster.c:1671&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#16 0x00000000006c4359 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x21dbe90) at postmaster.c:1380&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#17 0x000000000047eefb in main (argc=3, argv=0x21dbe90) at main.c:228&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The key observation is the &lt;code&gt;LWLockAcquire&lt;/code&gt; frame. Both the walsender and the checkpointer are trying to acquire the &lt;strong&gt;same LWLOCK address in exclusive mode&lt;/strong&gt;: &lt;code&gt;lock=lock@entry=0x2babd8cee5b8, mode=mode@entry=LW_EXCLUSIVE&lt;/code&gt; — waiting indefinitely.&lt;/p&gt;
&lt;p&gt;The function right above &lt;code&gt;LWLockAcquire&lt;/code&gt; is &lt;code&gt;SaveSlotToPath&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Looking at the source in &lt;code&gt;src/backend/replication/slot.c&lt;/code&gt;, the critical function &lt;code&gt;SaveSlotToPath&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//SaveSlotToPath stores slot state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SaveSlotToPath&lt;/span&gt;(ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; elevel)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{	&lt;span style="color:#75715e"&gt;//11.5 code
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		tmppath[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;char&lt;/span&gt;		path[MAXPGPATH];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			fd;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	ReplicationSlotOnDisk cp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;		was_dirty;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* and don&amp;#39;t do anything if there&amp;#39;s nothing to write */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;was_dirty)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//Acquire LWLock in exclusive mode at function entry
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockAcquire&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock, LW_EXCLUSIVE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//Note the fd logic — the error matches the second walsender error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;OpenTransientFile&lt;/span&gt;(tmppath, O_CREAT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_EXCL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_WRONLY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PG_BINARY);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (fd &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not create file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;//The logic for writing to fd — the error matches the first walsender error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; ((&lt;span style="color:#a6e22e"&gt;write&lt;/span&gt;(fd, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;cp, &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(cp))) &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(cp))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pgstat_report_wait_end&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;CloseTransientFile&lt;/span&gt;(fd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* if write didn&amp;#39;t set errno, assume problem is no disk space */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; save_errno &lt;span style="color:#f92672"&gt;?&lt;/span&gt; save_errno : ENOSPC;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not write to file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock);	&lt;span style="color:#75715e"&gt;//Release LWLock at end of function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;SaveSlotToPath&lt;/code&gt; acquires &lt;code&gt;LWLockAcquire&lt;/code&gt; on the slot&amp;rsquo;s &lt;code&gt;io_in_progress_lock&lt;/code&gt; in &lt;code&gt;LW_EXCLUSIVE&lt;/code&gt; mode — very similar to the wait event name: &lt;code&gt;io_in_progress_lock&lt;/code&gt; ↔ &lt;code&gt;replication_slot_io&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;At the end of the function, &lt;code&gt;LWLockRelease&lt;/code&gt; releases the lock.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;But in both &lt;code&gt;if&lt;/code&gt; branches, there is no &lt;code&gt;LWLockRelease&lt;/code&gt; — the function just returns directly!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The PostgreSQL log shows &amp;ldquo;could not create file&amp;rdquo; for &lt;code&gt;tmppath&lt;/code&gt;, meaning the code hit one of those two &lt;code&gt;if&lt;/code&gt; branches — either the &lt;strong&gt;write to state.tmp failed&lt;/strong&gt; branch or the &lt;strong&gt;create state.tmp failed&lt;/strong&gt; branch.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s reconstruct the timeline from the logs:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;11:36:07&lt;/strong&gt;: Logical replication first error — &amp;ldquo;could not write to file &amp;hellip; state.tmp&amp;rdquo;. Replication link dies.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;11:36:39&lt;/strong&gt;: Checkpointer error — &amp;ldquo;could not create file &amp;hellip; state.tmp&amp;rdquo;. One second later, checkpoint &amp;ldquo;completes&amp;rdquo; with 0 dirty buffers, 0 WAL.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;11:40:07&lt;/strong&gt;: Logical replication starts again. No more output.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;11:48:03&lt;/strong&gt;: Checkpointer triggers &lt;code&gt;start&lt;/code&gt; again. No more output.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Important: the first and second logical replication connections belong to &lt;strong&gt;different&lt;/strong&gt; walsender PIDs; the first and second checkpoint entries belong to the &lt;strong&gt;same&lt;/strong&gt; checkpointer PID.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fault mechanism reconstructed:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Logical replication walsender, due to memory pressure, fails to write &lt;code&gt;state.tmp&lt;/code&gt;, leaving a residual &lt;code&gt;state.tmp&lt;/code&gt; file behind.&lt;/li&gt;
&lt;li&gt;The checkpointer, encountering the residual &lt;code&gt;state.tmp&lt;/code&gt;, enters the &lt;code&gt;if (fd &amp;lt; 0)&lt;/code&gt; branch in &lt;code&gt;SaveSlotToPath&lt;/code&gt; after acquiring the LWLock in exclusive mode — and returns &lt;strong&gt;without releasing the LWLock&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;A new walsender starts for logical replication and tries to acquire the LWLock at the top of &lt;code&gt;SaveSlotToPath&lt;/code&gt; — waits indefinitely.&lt;/li&gt;
&lt;li&gt;The checkpointer triggers a new checkpoint and also tries to acquire the LWLock at the top of &lt;code&gt;SaveSlotToPath&lt;/code&gt; — waits indefinitely.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;With the mechanism clear, the answers follow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why did the walsender and checkpointer hang?&lt;/strong&gt; Residual &lt;code&gt;state.tmp&lt;/code&gt;. The checkpointer held the LWLock without releasing it. Both walsender and checkpointer wait indefinitely.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Who blocks whom?&lt;/strong&gt; The checkpointer blocks the walsender.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;How was it triggered?&lt;/strong&gt; The previous walsender exhausted memory, leaving an uncleaned &lt;code&gt;state.tmp&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Solutions?&lt;/strong&gt; Force restart the database.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Reproduction
 &lt;div id="reproduction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reproduction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For background on PostgreSQL logical replication, refer to: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;PG inner workings: Logical Replication&lt;/a&gt;. Key commands:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical &lt;span style="color:#f92672"&gt;-&lt;/span&gt;h &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;p &lt;span style="color:#ae81ff"&gt;5558&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;U lzl &lt;span style="color:#75715e"&gt;--slot=logical_test --start -f recv.sql &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The slot and replication link are ready:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,xact_start,state_change,wait_event,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,query &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idle&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; xact_start ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; query 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------------------------------+-------------------------------+---------------------+--------+----------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;59916&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; postgres &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;015534&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;015545&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,xact_start,state_change,wait_event,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,query &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity wher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;e &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idle&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; xact_start ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;59791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;566112&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WalSenderWaitForWAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_catalog.set_config(&lt;span style="color:#e6db74"&gt;&amp;#39;search_path&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,usename,application_name,backend_start,&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt;,pg_walfile_name_offset(sent_lsn) sentoffset,pg_walfile_name_offset(write_lsn) writeoffset,pg_walfile_name_offset(flush_lsn) flushoffset &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_replication;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sentoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; writeoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; flushoffset 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---------+------------------+------------------------------+-----------+------------------------------------+------------------------------------+------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;59791&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_recvlogical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56364&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; streaming &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000000000000001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6612032&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000000000000001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6612032&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;000000010000000000000001&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6612032&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since the problem is caused by &lt;code&gt;state.tmp&lt;/code&gt;, just &lt;code&gt;touch&lt;/code&gt; it under &lt;code&gt;pg_replslot&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;testhost logical_test]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pwd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pgdata&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;data11&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pg_replslot&lt;span style="color:#f92672"&gt;/&lt;/span&gt;logical_test&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_recvlogical&lt;/code&gt; immediately errors:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_recvlogical: unexpected termination &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; replication stream: ERROR: could &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; file &lt;span style="color:#e6db74"&gt;&amp;#34;pg_replslot/logical_test/state.tmp&amp;#34;&lt;/span&gt;: File &lt;span style="color:#66d9ef"&gt;exists&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Manual &lt;code&gt;CHECKPOINT&lt;/code&gt; hangs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--hang&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now check the walsender and session states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; datid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usesysid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; usename &lt;span style="color:#f92672"&gt;|&lt;/span&gt; application_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client_addr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client_hostname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client_port &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xact_start &lt;span style="color:#f92672"&gt;|&lt;/span&gt; query_start 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; state_change &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; backend_type 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------+-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-------------------------------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-+&lt;/span&gt;&lt;span style="color:#75715e"&gt;-------------------------------+-----------------+---------------------+--------+-------------+--------------+--------------------------------------------------------+------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;... 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Activity &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LogicalLauncherMain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical replication launcher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;058523&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;checkpoint&lt;/span&gt;; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; client backend
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;77638&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_recvlogical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;127&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;56928&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;495833&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;497754&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;498329&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; pg_catalog.set_config(&lt;span style="color:#e6db74"&gt;&amp;#39;search_path&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; walsender
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LWLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; replication_slot_io &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; checkpointer&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Perfectly reproduced — two &lt;code&gt;replication_slot_io&lt;/code&gt; wait events.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PG 12.3 Code Fix
 &lt;div id="pg-123-code-fix" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg-123-code-fix" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//Here showing 15.3, which has an extra save_errno vs 12.3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;SaveSlotToPath&lt;/span&gt;(ReplicationSlot &lt;span style="color:#f92672"&gt;*&lt;/span&gt;slot, &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;dir, &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; elevel)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{	
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	fd &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;OpenTransientFile&lt;/span&gt;(tmppath, O_CREAT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_EXCL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; O_WRONLY &lt;span style="color:#f92672"&gt;|&lt;/span&gt; PG_BINARY);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (fd &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * If not an ERROR, then release the lock before returning. In case
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * of an ERROR, the error recovery path automatically releases the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * lock, but no harm in explicitly releasing even in that case. Note
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * that LWLockRelease() could affect errno.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			save_errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		errno &lt;span style="color:#f92672"&gt;=&lt;/span&gt; save_errno;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode_for_file_access&lt;/span&gt;(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;could not create file &lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: %m&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						tmppath)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;LWLockRelease&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;slot&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;io_in_progress_lock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}	
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In &lt;strong&gt;every &lt;code&gt;if&lt;/code&gt; branch&lt;/strong&gt;, &lt;code&gt;LWLockRelease&lt;/code&gt; is called before returning. This eliminates the logical vulnerability where the LWLock is not released in certain code paths. The code is clearly more robust.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Solution Analysis
 &lt;div id="solution-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solution-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Deleting &lt;code&gt;state.tmp&lt;/code&gt; won&amp;rsquo;t help — the LWLock is already held; the file was just the trigger.&lt;/li&gt;
&lt;li&gt;Restarting the replication link or killing the downstream won&amp;rsquo;t help — the checkpointer is the one holding the LWLock.&lt;/li&gt;
&lt;li&gt;The checkpointer cannot be killed directly. The only solution in this state is a &lt;strong&gt;force restart&lt;/strong&gt; to perform instance recovery. A normal shutdown is impossible because &lt;code&gt;CHECKPOINT&lt;/code&gt; is blocked.&lt;/li&gt;
&lt;li&gt;The ultimate fix: upgrade to PG 12.3 or later.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;(I also tried using gdb to call &lt;code&gt;LWLockRelease&lt;/code&gt; with the LWLock address from pstack — it crashed the test instance immediately. Not recommended.)&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Logical replication is one of the most significant feature enhancements in recent PostgreSQL releases. Early versions did have many issues and pitfalls. PostgreSQL&amp;rsquo;s &lt;a href="https://blog.csdn.net/qq_40687433/article/details/136405862?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;ambitious logical replication approach&lt;/a&gt; shows genuine innovation, and the community continuously refines and strengthens it — nearly every minor release includes many logical replication updates. This case is a real-world example: the logical replication code is clearly becoming more robust.&lt;/p&gt;
&lt;p&gt;Logical replication has a lot of depth. Recommended reading: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;PG Inner Workings: Logical Replication&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Case Study: Predicate Out-of-Bounds and Prepared Statement Issues in PostgreSQL</title><link>https://lastdba.com/en/2024/08/12/case-study-predicate-out-of-bounds-and-prepared-statement-issues-in-postgresql/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/case-study-predicate-out-of-bounds-and-prepared-statement-issues-in-postgresql/</guid><description>&lt;h2 class="relative group"&gt;The Phenomenon
 &lt;div id="the-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Case: The execution plan changed and chose the wrong index, causing SQL performance to degrade from milliseconds to seconds. After collecting statistics, the business SQL was still slow. Ultimately, the problem was resolved by dropping the &lt;code&gt;DAILY_DATE&lt;/code&gt; time index and creating a composite index on &lt;code&gt;(DAILY_DATE, A_ID)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did the optimizer choose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index?&lt;/li&gt;
&lt;li&gt;Why did collecting statistics have no effect?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Stale Statistics
 &lt;div id="stale-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#stale-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Simplified SQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; A_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The optimizer chose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;The Phenomenon
 &lt;div id="the-phenomenon" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-phenomenon" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Case: The execution plan changed and chose the wrong index, causing SQL performance to degrade from milliseconds to seconds. After collecting statistics, the business SQL was still slow. Ultimately, the problem was resolved by dropping the &lt;code&gt;DAILY_DATE&lt;/code&gt; time index and creating a composite index on &lt;code&gt;(DAILY_DATE, A_ID)&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Questions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Why did the optimizer choose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index?&lt;/li&gt;
&lt;li&gt;Why did collecting statistics have no effect?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Stale Statistics
 &lt;div id="stale-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#stale-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Simplified SQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; A_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; PARTITION_KEY &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The optimizer chose the &lt;code&gt;DAILY_DATE&lt;/code&gt; index instead of the more selective &lt;code&gt;A_ID&lt;/code&gt; index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;204&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tablzl_p202401_DAILY_DATE_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tablzl_p202401 tablzl_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;203&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#e6db74"&gt;&amp;#39;20240223&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((partition_key &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202401&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (partition_key &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202402&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((A_ID)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;ID1234567890987654321&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_delete)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tablzl_p202402_DAILY_DATE_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tablzl_p202402 tablzl_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;204&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (DAILY_DATE &lt;span style="color:#f92672"&gt;=&lt;/span&gt; to_date(&lt;span style="color:#e6db74"&gt;&amp;#39;20240223&amp;#39;&lt;/span&gt;::text, &lt;span style="color:#e6db74"&gt;&amp;#39;yyyyMMdd&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((partition_key &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202401&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (partition_key &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;202402&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((A_ID)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;ID1234567890987654321&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_delete)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;For the &lt;code&gt;p202401&lt;/code&gt; partition, whether it uses the &lt;code&gt;DAILY_DATE&lt;/code&gt; or &lt;code&gt;A_ID&lt;/code&gt; index doesn&amp;rsquo;t make much difference, because the January partition has no data for February 23.&lt;/li&gt;
&lt;li&gt;For the &lt;code&gt;p202402&lt;/code&gt; partition, whether it uses the &lt;code&gt;DAILY_DATE&lt;/code&gt; or &lt;code&gt;A_ID&lt;/code&gt; index makes a huge difference. Using the &lt;code&gt;DAILY_DATE&lt;/code&gt; index, its estimated cost is 3.35 with rows=1, but in reality there are millions of rows, causing it to run for 2 seconds.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The statistics for &lt;code&gt;p202402&lt;/code&gt; contain MCV (Most Common Values):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stats &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; tablename&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tablzl_p202402&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; attname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;DAILY_DATE&amp;#39;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;gx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_vals &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_freqs &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0481&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;047766667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0466&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0449&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0441&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043833334&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043733332&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043466665&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043133333&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043066666&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;042366665&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041866668&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039766666&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0394&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039333332&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038766667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03863333&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0381&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038066667&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037966665&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037566666&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036733333&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Calculate the sum of MCV frequencies:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0481&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;047766667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0466&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0449&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0441&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043833334&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043733332&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043466665&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043133333&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;043066666&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;042366665&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041866668&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041366667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039766666&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0394&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039333332&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038766667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;03863333&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;0381&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038066667&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037966665&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037566666&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036733333&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;999999990&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It&amp;rsquo;s exactly 1, meaning the planner estimates that days 1-22 represent all the data in this partition, and day 23 should have 0 rows. So when estimating rows for day 23 data, the planner assumes rows=1, and thus chooses the &lt;code&gt;DAILY_DATE&lt;/code&gt; index. In reality, day 23 had millions of rows.&lt;/p&gt;
&lt;p&gt;Essentially, this is a problem caused by stale statistics. Why were the first 22 days fine, and why didn&amp;rsquo;t day 23 trigger automatic collection?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,reloptions &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tablzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reloptions 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; autovacuum_analyze_scale_factor;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; autovacuum_analyze_scale_factor 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The trigger threshold defaults to 0.1 — auto-ANALYZE only triggers when data changes reach 1/10. This is a monthly partition, with data inserted and updated daily. Early in the month, writing 2 million rows per day would trigger multiple ANALYZEs (the threshold of 50 can be ignored), but at month end, for example on day 23, writing 2 million rows would not trigger ANALYZE because only 1/23 of the data changed. In this scenario, data was also updated after insertion — 2 million inserts and 2 million updates — so the data change on day 23 was about 1/11, just barely not triggering ANALYZE. &lt;strong&gt;This also explains why the first 20 days ran stably.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Additionally, since the data change threshold is a ratio, as long as the daily data change volume is relatively uniform, this month-end statistics inaccuracy problem will always occur!&lt;/p&gt;

&lt;h2 class="relative group"&gt;Execution Plan Caching
 &lt;div id="execution-plan-caching" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plan-caching" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since this was a stale statistics problem, manually collecting statistics should have resolved it. In practice, however, after collection, the business SQL was still slow.&lt;/p&gt;
&lt;p&gt;After running ANALYZE, manual &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt; showed the correct execution plan.&lt;/p&gt;
&lt;p&gt;This indicated that ANALYZE should have helped, but it didn&amp;rsquo;t affect the business sessions. Since the SQL execution used long-lived sessions, I suspected that the JDBC driver was using prepared statements to cache execution plans (&lt;a href="https://jenkov.com/tutorials/jdbc/preparedstatement.html#:~:text=JDBC%20PreparedStatement%201%20Creating%20a%20PreparedStatement%20Before%20you,Reusing%20a%20PreparedStatement%20...%205%20PreparedStatement%20Performance%20" target="_blank" rel="noreferrer"&gt;JDBC PreparedStatement&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;In PostgreSQL 13 (RasesQL 1.3), collecting statistics does not invalidate prepared statements; re-parsing only happens by reconnecting the session.&lt;/p&gt;
&lt;p&gt;Prepared statements generate a generic execution plan. Due to inaccurate statistics, the generic execution plan, like the parameter-specific execution plan, could choose the wrong index.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Characteristics of Prepared Statements
 &lt;div id="characteristics-of-prepared-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#characteristics-of-prepared-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;psql&lt;/code&gt; supports prepared statements, controlled by the &lt;code&gt;plan_cache_mode&lt;/code&gt; parameter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;auto&lt;/code&gt;: default, uses the five-execution mechanism&lt;/li&gt;
&lt;li&gt;&lt;code&gt;force_custom_plan&lt;/code&gt;: always performs hard parsing, generating a custom plan&lt;/li&gt;
&lt;li&gt;&lt;code&gt;force_generic_plan&lt;/code&gt;: always uses the generic plan with bound variables&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Syntax:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;deallocate&lt;/span&gt; plan1&lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;all&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- invalidates the prepared statement; disconnecting also works&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;View: (basically useless since it&amp;rsquo;s local — you can&amp;rsquo;t see anything in production)&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;How Generic Plans Are Generated
 &lt;div id="how-generic-plans-are-generated" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-generic-plans-are-generated" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Normally, a prepared statement can generate a generic plan after running 5 times. There are many demonstrations online, so I won&amp;rsquo;t demonstrate the normal case here. Below are the &amp;ldquo;magical&amp;rdquo; phenomena I observed during testing:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Prepare data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl1(id varchar(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; int);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),&lt;span style="color:#66d9ef"&gt;EXTRACT&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;2023-11-30&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;1 minute&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1(id);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1(&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; tlzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Execute prepared statement
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note that only data before December was inserted — December has no data. At this point, querying December data can use the &lt;code&gt;month&lt;/code&gt; index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;035&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;170&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;058&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;551&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;021&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;021&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;168&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;046&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;488&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;017&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;157&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;040&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;419&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;019&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;020&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;160&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;044&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;479&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_month &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;018&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;041&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;426&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Sixth execution
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;12&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;044&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;045&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;023&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On the sixth execution, the generic plan was bound — but it wasn&amp;rsquo;t the same plan as the first five executions; it used the &lt;code&gt;id&lt;/code&gt; index. If &lt;code&gt;id&lt;/code&gt; had even higher cardinality, you could also observe cases where the generic plan simply couldn&amp;rsquo;t be bound (not shown here).&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at the source code:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;choose_custom_plan&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(CachedPlanSource &lt;span style="color:#f92672"&gt;*&lt;/span&gt;plansource, ParamListInfo boundParams)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Generate custom plans until we have done at least 5 (arbitrary) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	avg_custom_cost &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;total_custom_cost &lt;span style="color:#f92672"&gt;/&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Prefer generic plan if it&amp;#39;s less expensive than the average custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plan. (Because we include a charge for cost of planning in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * custom-plan costs, this means the generic plan only has to be less
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * expensive than the execution cost plus replan cost of the custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plans.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Note that if generic_cost is -1 (indicating we&amp;#39;ve not yet determined
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the generic plan cost), we&amp;#39;ll always prefer generic at this point.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;generic_cost &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; avg_custom_cost)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}		
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As long as the generic plan&amp;rsquo;s cost is less than the average cost of the first 5 custom plans, the generic plan is used.&lt;/p&gt;
&lt;p&gt;While the 5-execution mechanism is well-known, it&amp;rsquo;s important to note how the generic plan is generated. On the 5th execution, there is no generic plan yet (initially, &lt;code&gt;generic_cost=-1&lt;/code&gt;), so it directly goes to the &lt;code&gt;!customplan&lt;/code&gt; logic in &lt;code&gt;GetCachedPlan&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CachedPlan &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;GetCachedPlan&lt;/span&gt;(CachedPlanSource &lt;span style="color:#f92672"&gt;*&lt;/span&gt;plansource, ParamListInfo boundParams,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt; useResOwner, QueryEnvironment &lt;span style="color:#f92672"&gt;*&lt;/span&gt;queryEnv)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	customplan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(plansource, boundParams);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;customplan)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;CheckCachedPlan&lt;/span&gt;(plansource))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* We want a generic plan, and we already have a valid one */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;gplan;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;magic &lt;span style="color:#f92672"&gt;==&lt;/span&gt; CACHEDPLAN_MAGIC);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Build a new generic plan */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;BuildCachedPlan&lt;/span&gt;(plansource, qlist, NULL, queryEnv);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Just make real sure plansource-&amp;gt;gplan is clear */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;ReleaseGenericPlan&lt;/span&gt;(plansource);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Link the new generic plan into the plansource */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;gplan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plan;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;refcount&lt;span style="color:#f92672"&gt;++&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Immediately reparent into appropriate context */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;is_saved)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* saved plans all live under CacheMemoryContext */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;MemoryContextSetParent&lt;/span&gt;(plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;context, CacheMemoryContext);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;is_saved &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/* otherwise, it should be a sibling of the plansource */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;MemoryContextSetParent&lt;/span&gt;(plan&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;context,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;									 &lt;span style="color:#a6e22e"&gt;MemoryContextGetParent&lt;/span&gt;(plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;context));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* Update generic_cost whenever we make a new generic plan */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;generic_cost &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;cached_plan_cost&lt;/span&gt;(plan, false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * If, based on the now-known value of generic_cost, we&amp;#39;d not have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * chosen to use a generic plan, then forget it and make a custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * plan. This is a bit of a wart but is necessary to avoid a
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * glitch in behavior when the custom plans are consistently big
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * winners; at some point we&amp;#39;ll experiment with a generic plan and
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * find it&amp;#39;s a loser, but we don&amp;#39;t want to actually execute that
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * plan.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			customplan &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(plansource, boundParams);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * If we choose to plan again, we need to re-copy the query_list,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * since the planner probably scribbled on it. We can force
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * BuildCachedPlan to do that by passing NIL.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			qlist &lt;span style="color:#f92672"&gt;=&lt;/span&gt; NIL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; plan;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}	
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In the &lt;code&gt;!customplan&lt;/code&gt; logic, if a generic plan already exists, use it directly. If not, generate one via &lt;code&gt;BuildCachedPlan&lt;/code&gt;, which is the main logic for generating plans — converting a query tree into a plan tree.&lt;/p&gt;
&lt;p&gt;What about parameters? As the comments explain, pass NULL when there are no parameters to enter the plan generation logic:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;To build a generic, parameter&lt;span style="color:#f92672"&gt;-&lt;/span&gt;value&lt;span style="color:#f92672"&gt;-&lt;/span&gt;independent plan, pass NULL &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; boundParams. To build a custom plan, pass the actual parameter values via
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; boundParams&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;What execution plan does the optimizer prefer when NULL is passed? This part of the code logic is somewhat complex. From the optimizer&amp;rsquo;s perspective, there may be multiple plans to choose from, but one must be selected as the generic plan.&lt;/p&gt;
&lt;p&gt;And that selected generic plan is what gets compared against the cost of the first 5 plans.&lt;/p&gt;
&lt;p&gt;Why didn&amp;rsquo;t repeatedly executing a lower-cost plan produce the desired generic plan?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What the generic plan looks like has nothing to do with the first five execution plans — the first five only determine whether this generic plan gets bound.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From an optimizer design perspective, generic plans are meant to reduce parsing time and improve SQL execution efficiency, suitable for many small queries. The problem is that generic plans themselves are crude, and PostgreSQL introduced the five-execution mechanism precisely to reduce the likelihood of a generic plan being terrible.&lt;/p&gt;
&lt;p&gt;Even with the five-execution mechanism, the reasons a bad generic plan still gets bound are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Generic plans are plans too, and they can inherently be bad&lt;/li&gt;
&lt;li&gt;Statistics are inaccurate, so the generic plan&amp;rsquo;s cost estimate is very low&lt;/li&gt;
&lt;li&gt;The first five executions had low selectivity (or other factors) causing high custom plan costs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Prepared Statement Invalidation
 &lt;div id="prepared-statement-invalidation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#prepared-statement-invalidation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Besides DDL, &lt;code&gt;DEALLOCATE&lt;/code&gt;, and disconnecting sessions, collecting statistics can also invalidate prepared statements — but this is a PostgreSQL 14 feature.&lt;/p&gt;
&lt;p&gt;PostgreSQL 13:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;PostgreSQL will force re-analysis and re-planning of the statement before using it whenever database objects used in the statement have undergone definitional (DDL) changes since the previous use of the prepared statement&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;PostgreSQL 14:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;PostgreSQL will force re-analysis and re-planning of the statement before using it whenever database objects used in the statement have undergone definitional (DDL) changes or their planner statistics have been updated since the previous use of the prepared statement&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Test confirming that in PostgreSQL 13, collecting statistics does not invalidate prepared statements:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;033&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;033&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;098&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;050&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepare_time &lt;span style="color:#f92672"&gt;|&lt;/span&gt; parameter_types &lt;span style="color:#f92672"&gt;|&lt;/span&gt; from_sql 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+-----------------------------------------------+-------------------------------+-----------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plan1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;966733&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;text,integer&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; tlzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ANALYZE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_prepared_statements;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; prepare_time &lt;span style="color:#f92672"&gt;|&lt;/span&gt; parameter_types &lt;span style="color:#f92672"&gt;|&lt;/span&gt; from_sql 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+-----------------------------------------------+-------------------------------+-----------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; plan1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PREPARE&lt;/span&gt; plan1(text,integer) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;966733&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;text,integer&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;month&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; plan1(&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;051&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;052&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((id)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (&lt;span style="color:#66d9ef"&gt;month&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;022&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;098&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;JDBC Prepared Statements
 &lt;div id="jdbc-prepared-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#jdbc-prepared-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Prepared statements are not unique to PostgreSQL — other databases also have similar pre-parsing features. For example, Oracle can achieve similar functionality.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://jenkov.com/tutorials/jdbc/preparedstatement.html#:~:text=JDBC%20PreparedStatement%201%20Creating%20a%20PreparedStatement%20Before%20you,Reusing%20a%20PreparedStatement%20...%205%20PreparedStatement%20Performance%20" target="_blank" rel="noreferrer"&gt;JDBC&lt;/a&gt; itself can call the database&amp;rsquo;s pre-parsing interface and directly use prepared statements.&lt;/p&gt;
&lt;p&gt;Example JDBC configuration:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;String &lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;select * from people where id=?&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PreparedStatement preparedStatement &lt;span style="color:#f92672"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;connection&lt;/span&gt;.prepareStatement(&lt;span style="color:#66d9ef"&gt;sql&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Recommendations
 &lt;div id="recommendations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#recommendations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;Reduce the table-level &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; to &lt;code&gt;0.02&lt;/code&gt; (why 0.02? Because 0.02 &amp;lt; 1/31). Since data is written and queried simultaneously, manual collection timing is hard to get right; reducing &lt;code&gt;autovacuum_analyze_scale_factor&lt;/code&gt; can only mitigate this problem.&lt;/li&gt;
&lt;li&gt;Consider removing the PREPARE setting in JDBC, or set &lt;code&gt;force_custom_plan&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Adjust the SQL logic.&lt;/li&gt;
&lt;li&gt;Adjust indexes: 4.1 Remove unnecessary time indexes; 4.2 Rebuild the index that gets chosen after predicate out-of-bounds as a composite index that includes the &lt;code&gt;id&lt;/code&gt; field (a good suggestion).&lt;/li&gt;
&lt;li&gt;Emergency procedure: If business performance doesn&amp;rsquo;t recover after statistics collection, and you&amp;rsquo;ve confirmed the execution plan has changed via manual EXPLAIN, consider killing sessions (for versions before 13).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Finally, predicate out-of-bounds problems exist in essentially all databases, especially on time-based fields. There is currently no simple yet perfectly effective solution. Oracle&amp;rsquo;s SPM (SQL Plan Management) gains another point in my favorability&amp;hellip;&lt;/p&gt;</content:encoded></item><item><title>From Extremely Slow Unique Index Scan to Index Bloat</title><link>https://lastdba.com/en/2024/08/12/from-extremely-slow-unique-index-scan-to-index-bloat/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/from-extremely-slow-unique-index-scan-to-index-bloat/</guid><description>&lt;h2 class="relative group"&gt;How Did a Primary Key Query Access Multiple Data Pages?
 &lt;div id="how-did-a-primary-key-query-access-multiple-data-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-did-a-primary-key-query-access-multiple-data-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Continuing from the previous article: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/137248306?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;A Classic Case of Long Transactions, Table Bloat, and LIMIT Problems&lt;/a&gt;, there was one point not explained in detail:&lt;/p&gt;
&lt;p&gt;Why does a query using the primary key generate so many shared hits?
Why does index bloat cause access to multiple data pages? Can&amp;rsquo;t data outside the page be located through the corresponding index entry?
This relates to index version management — indexes do carry some version information, but not much. Let&amp;rsquo;s first review PostgreSQL&amp;rsquo;s btree index structure.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;How Did a Primary Key Query Access Multiple Data Pages?
 &lt;div id="how-did-a-primary-key-query-access-multiple-data-pages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-did-a-primary-key-query-access-multiple-data-pages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Continuing from the previous article: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/137248306?spm=1001.2014.3001.5501" target="_blank" rel="noreferrer"&gt;A Classic Case of Long Transactions, Table Bloat, and LIMIT Problems&lt;/a&gt;, there was one point not explained in detail:&lt;/p&gt;
&lt;p&gt;Why does a query using the primary key generate so many shared hits?
Why does index bloat cause access to multiple data pages? Can&amp;rsquo;t data outside the page be located through the corresponding index entry?
This relates to index version management — indexes do carry some version information, but not much. Let&amp;rsquo;s first review PostgreSQL&amp;rsquo;s btree index structure.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0ed8fa95f2df.png" alt="Insert image description here" /&gt;（https://en.wikibooks.org/wiki/PostgreSQL/Index_Btree）&lt;/p&gt;
&lt;p&gt;This PG btree wiki diagram doesn&amp;rsquo;t explain how dead tuples and dead index entries are accessed — it lacks version information. For now, you don&amp;rsquo;t need to understand every detail of this structure; just know that a btree structure like this exists.&lt;/p&gt;
&lt;p&gt;To investigate the btree version access problem, let&amp;rsquo;s run a test:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab1(a bigserial,b char(&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab1_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab1(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (autovacuum_enabled &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;); &lt;span style="color:#75715e"&gt;--disable autovacuum
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;storage&lt;/span&gt; PLAIN; &lt;span style="color:#75715e"&gt;--disable toast&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tab1(b) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;zzzzzzzzz&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--View tuple info on the data page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+--------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111875&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--View index entry info on the index page (note: index page 0 is the meta page, has no data)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab1_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+---------+-------+------+-------------------------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Only one row inserted: data page 0 has only 1 tuple, index page 1 has only one entry pointing to ctid(0,1).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;xxxxxxx&amp;#39;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111875&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111876&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111876&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab1_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+---------+-------+------+-------------------------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After updating one row: data page 0 has 2 tuples. Only ctid(0,2) is alive. The tuple at lp=1 is &amp;ldquo;dead&amp;rdquo; but lp_flags is still &amp;ldquo;NORMAL&amp;rdquo;! Index page 1 still has only one entry pointing to ctid(0,1), which is the &amp;ldquo;dead&amp;rdquo; tuple. This is the principle of HOT (Heap-Only Tuple): when updating within the same page, the index entry is not updated. The index follows the ctid chain from the dead tuple to find the truly alive data tuple.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s update 10 times in a loop, producing 2 data pages and 1 index page:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; LOOP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;md5(i::text);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; LOOP; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After updates:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--First data page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flag
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;s 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-------------+--------+--------+-------+--------------------------------------------------------------------------------------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_REDIRECT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111876&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Second data page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+--------------------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;111877&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On the first data page (page 0), the LP_REDIRECT status directly tells us the page definitely has HOT chains. At lp=1 there is no other information — not even ctid, data, or infomask. You cannot trace through this lp to find the final data. For the first index entry, it&amp;rsquo;s sufficient to access ctid(0,1); there is no desired data row in this page. But data page 2 has no LP_REDIRECT, and the index can find the live tuple (1,5) within the page by following the ctid chain from ctid(1,0).&lt;/p&gt;
&lt;p&gt;Source code explanation of line pointer states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *lp_flags has these possible states. An UNUSED line pointer is available
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *for immediate re-use, the other states are not.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_UNUSED		0		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* unused (should always have lp_len=0) */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_NORMAL		1		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* used (should always have lp_len&amp;gt;0) */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_REDIRECT		2		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* HOT redirect (should have lp_len=0), actually not HOT but cross-page redirect indicator */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define LP_DEAD			3		&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* dead, may or may not have storage */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//Explanation of LP_REDIRECT
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Redirecting line pointer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	A line pointer that points to another line pointer and has no
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	associated tuple. It has the special lp_flags state LP_REDIRECT,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	and lp_off is the OffsetNumber of the line pointer it links to.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	This is used when a root tuple becomes dead but we cannot prune
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	the line pointer because there are non&lt;span style="color:#f92672"&gt;-&lt;/span&gt;dead heap&lt;span style="color:#f92672"&gt;-&lt;/span&gt;only tuples
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	further down the chain.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Looking back more carefully, the lp status of what we consider &amp;ldquo;dead&amp;rdquo; tuples is LP_NORMAL, not LP_DEAD. This is important because we&amp;rsquo;ll revisit this point later.&lt;/p&gt;
&lt;p&gt;Continuing to examine the index page:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab1_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+---------+-------+------+-------------------------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because an additional page was created, HOT no longer applies. The index is updated. The index page has only 2 entries, both alive (dead=f), each pointing to the first tuple of its respective page: (0,1) and (1,1). For cross-page updates, the index page is also updated, with each index entry pointing to its own page. Note: at this point the table has only 1 row of data, but the index has 2 entries, both alive. This is why a primary key scan accesses multiple data pages.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s update more data to produce multiple index pages:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; LOOP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;md5(i::text);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; LOOP; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--First index page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab1_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------------+---------+-------+------+-------------------------+------+----------+-------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1278&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1277&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(0,1)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(1,1)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;222&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(222,1)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(223,1)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;444&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(444,1)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(445,1)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;666&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(666,1)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(667,1)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;888&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(888,1)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(889,1)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Second index page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab1_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------+---------+-------+------+-------------------------+------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1278&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1278&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1279&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1279&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1280&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1280&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1281&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1281&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;152&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1429&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1429&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;153&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1430&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1430&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;153&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Third index page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab1_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------+---------+-------+------+-------------------------+------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1277&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There are 3 index pages total. Page 1 is the root node. Pages 2 and 3 are leaf nodes. The dead status of all their index entries is &amp;ldquo;f&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s return to the SQL, using the primary key index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4012&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;594&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;596&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: exact&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1437&lt;/span&gt; dirtied&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1026&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; idx_tab1_a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;152&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;153&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1431&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;087&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;614&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When querying by primary key, shared hit is 1437, roughly matching the ~1430 table pages. Since indexes lack version information and the dead status of index entries hasn&amp;rsquo;t been updated, PostgreSQL follows all live index entries to find version information in the data pages. This is why a primary key index scan can be extremely slow.&lt;/p&gt;

&lt;h2 class="relative group"&gt;kill index item
 &lt;div id="kill-index-item" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#kill-index-item" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since indexes don&amp;rsquo;t store visibility information (i.e., MVCC version info), the visibility of the tuple pointed to by an index determines the index visibility itself. This is also why index-only scans in PostgreSQL still access data pages. Of course, with the visibility map (VM), the VM records which data pages are all-visible and all-frozen, so index-only scans won&amp;rsquo;t access those pages — they&amp;rsquo;re already visible.&lt;/p&gt;
&lt;p&gt;Even without VACUUM, the PostgreSQL kernel has a method for handling this kind of index bloat — kill index item. This feature is sometimes called Simple deletion or index deletion (terminology from &lt;code&gt;src/backend/access/nbtree/README&lt;/code&gt;). Essentially, it &lt;strong&gt;marks index entries corresponding to tuples that are already LP_DEAD as dead&lt;/strong&gt;, without changing the existing index structure.&lt;/p&gt;
&lt;p&gt;Source code function &lt;code&gt;_bt_killitems&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; _bt_killitems &lt;span style="color:#f92672"&gt;-&lt;/span&gt; set LP_DEAD state &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; items an indexscan caller has
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; told us were killed&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This clearly states that index scans trigger kill item operations (meaning &lt;strong&gt;SELECT can also trigger this operation to update the index&lt;/strong&gt;). This is easy to test. Since our previous data has already been index-scanned, let&amp;rsquo;s rebuild data for testing.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab2(a bigserial,b char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab2_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab2(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab2_b &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab2(b);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab2 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (autovacuum_enabled &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;); &lt;span style="color:#75715e"&gt;--disable autovacuum
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab2 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;storage&lt;/span&gt; PLAIN; &lt;span style="color:#75715e"&gt;--disable toast
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert 1 row and update repeatedly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tab2(b) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;00000&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; LOOP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab2 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;i::text;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; LOOP; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Table pages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab2&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;115&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;116&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;117&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;118&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;119&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;120&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;121&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_COMBOCID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Index a pages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-----------+---------+-------+------+------+---------+-----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(44,5)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(44,6)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(47,53)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(47,54)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(51,43)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(51,44)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(55,33)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(55,34)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(59,23)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(59,24)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8360&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(63,13)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(63,14)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Index b pages
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_b&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+---------+---------+-------+------+------+---------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now query the table with a sequential scan, then examine the data tuple and index entry states:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;204&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3114&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;412&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;077&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;173&lt;/span&gt; dirtied&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;173&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;042&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;090&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab2&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+----------+--------+--------+-------+-----------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-----------+---------+-------+------+------+---------+-----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(44,5)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(44,6)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(47,53)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(47,54)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(51,43)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(51,44)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(55,33)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(55,34)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(59,23)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(59,24)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8360&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(63,13)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(63,14)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_b&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+---------+---------+-------+------+------+---------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Data tuples: all pages except the last were marked LP_DEAD.
Index entries: nothing changed.&lt;/p&gt;
&lt;p&gt;Now query again using index a:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_tab2_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;412&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;282&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;510&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;190&lt;/span&gt; dirtied&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;058&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;525&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab2&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+----------+--------+--------+-------+-----------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_DEAD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-----------+---------+-------+------+------+---------+-----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(44,5)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(44,6)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(47,53)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(47,54)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(51,43)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(51,44)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(55,33)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(55,34)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8414&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(59,23)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(59,24)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8360&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(63,13)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(63,14)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_b&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+---------+---------+-------+------+------+---------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;54&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The dead tuples in index a have all been marked dead=t, while dead tuples in index b remain dead=f because we haven&amp;rsquo;t scanned index b.&lt;/p&gt;
&lt;p&gt;Now query through index a again:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_tab2_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;412&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;020&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;021&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;059&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;033&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because the index entries for dead tuples in index a have been marked dead=t, there&amp;rsquo;s no need to check version information on data pages to determine whether tuples are &amp;ldquo;alive.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Why is shared hit=10 here, still somewhat high? Because kill index item only marks dead index entries without changing the index structure, so the number of index pages hasn&amp;rsquo;t decreased. These 10 shared hits correspond to 10 index pages (including the meta page).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; tab2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ANALYZE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relpages,reltuples &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_a&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reltuples 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_tab2_a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Bottom-Up deletion
 &lt;div id="bottom-up-deletion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bottom-up-deletion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In PG14, the trigger condition for index deletion was enhanced. As mentioned earlier, index deletion is triggered by scanning the index. In PG14, index deletion can also be triggered when an index page split is imminent, to find free index space and reduce the probability of page splits.&lt;/p&gt;
&lt;p&gt;This feature reduces index splits and thus also reduces index bloat, mitigating the problems caused by index bloat.&lt;/p&gt;
&lt;p&gt;For specific testing, see: &lt;a href="https://www.cybertec-postgresql.com/en/index-bloat-reduced-in-postgresql-v14/?spm=a2c6h.12873639.article-detail.8.2f153438mIV8JK" target="_blank" rel="noreferrer"&gt;INDEX BLOAT REDUCED IN POSTGRESQL V14&lt;/a&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;index deduplication
 &lt;div id="index-deduplication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-deduplication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PG13 introduced the index deduplication feature, which brings the GIN index posting list concept into btree indexes to reduce the space occupied by duplicate btree index entries and mitigate index split issues.&lt;/p&gt;
&lt;p&gt;Previously, btree index entries pointed to only one ctid (as we saw in the tests above). With deduplicate index items, one index entry can have a posting list, and one posting list can hold multiple ctids.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The representation of posting lists is almost identical to the posting lists used by GIN&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Like GIN posting tree(list) (the btree posting list may not exactly follow this structure — needs further study):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8b0c1a3b1562.png" alt="Insert image description here" /&gt;
（https://postgrespro.com/blog/pgsql/4261647）&lt;/p&gt;
&lt;p&gt;Testing index deduplication:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab3(same char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;),diff char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab3_same &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab3(same);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab3_diff &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab3(diff);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tab3 &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;::text,i::text &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;99999&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; i;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab3_same&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+------------+---------+-------+------+------+----------+-----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;104&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4097&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;120&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;104&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8398&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(69,19)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(69,20)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8398&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(75,21)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(75,22)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8398&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;81&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(81,23)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(81,24)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8398&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;87&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(87,25)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(87,26)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8398&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1352&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;93&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;27&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(93,27)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(93,28)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8344&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1024&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;(99,29)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;(99,30)&amp;#34;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab3_diff&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+--------+---------+-------+------+------+--------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;24&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;... 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;) &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The tids column in the bt_page_items function is essentially the posting list. The &lt;code&gt;same&lt;/code&gt; field was inserted with identical data and produced deduplication in the index; the &lt;code&gt;diff&lt;/code&gt; field had no duplicate data and produced no deduplication.&lt;/p&gt;
&lt;p&gt;The space difference is enormous:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relpages,reltuples &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab3%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reltuples 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_tab3_diff &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1484&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;90000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_tab3_same &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;81&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;90000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Can unique indexes produce deduplication?
 &lt;div id="can-unique-indexes-produce-deduplication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#can-unique-indexes-produce-deduplication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Unique indexes have no duplicate data, so it seems like they wouldn&amp;rsquo;t. In practice, they can. Because even with unique indexes, when HOT can&amp;rsquo;t satisfy an update, multiple index entries are created. We can see this from the first test case in this article. Repeatedly updating a single row with UPDATE also produces deduplication, which occurs before delete index item.&lt;/p&gt;
&lt;p&gt;Additionally, when delete index item removes a posting list index entry, it must ensure that &lt;strong&gt;all&lt;/strong&gt; ctids under the posting list correspond to DEAD tuples.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Disabling deduplication
 &lt;div id="disabling-deduplication" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#disabling-deduplication" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Index deduplication was introduced in PG13. The feature is enabled by default and can be disabled at the index level. Modifying deduplicate_items on an index won&amp;rsquo;t directly change the existing index structure; it only affects newly inserted data.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab3_same &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (deduplicate_items&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab3_same1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab3(same) &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; (deduplicate_items&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;What does VACUUM do?
 &lt;div id="what-does-vacuum-do" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-does-vacuum-do" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;VACUUM does many things. Here we&amp;rsquo;ll only focus on table/index bloat and space reclamation, skipping wraparound and other topics.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s test with tab2, where we repeatedly updated a single row. Simple deletion has already been triggered, and table/index entries are almost all DEAD.&lt;/p&gt;
&lt;p&gt;Run VACUUM directly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt; tab2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: vacuuming &lt;span style="color:#e6db74"&gt;&amp;#34;public.tab2&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: scanned &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_tab2_a&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; remove &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: CPU: &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, elapsed: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: scanned &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_tab2_b&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; remove &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: CPU: &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, elapsed: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#e6db74"&gt;&amp;#34;tab2&amp;#34;&lt;/span&gt;: removed &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173&lt;/span&gt; pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: CPU: &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, elapsed: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_tab2_a&amp;#34;&lt;/span&gt; now &lt;span style="color:#66d9ef"&gt;contains&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions were removed.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; pages have been deleted, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; currently reusable.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CPU: &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, elapsed: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_tab2_b&amp;#34;&lt;/span&gt; now &lt;span style="color:#66d9ef"&gt;contains&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;276&lt;/span&gt; pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions were removed.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;269&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; pages have been deleted, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; currently reusable.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CPU: &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, elapsed: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;INFO: &lt;span style="color:#e6db74"&gt;&amp;#34;tab2&amp;#34;&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;found&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; removable, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; nonremovable &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173&lt;/span&gt; pages
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; dead &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; versions cannot be removed yet, oldest xmin: &lt;span style="color:#ae81ff"&gt;526&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;There were &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; unused item identifiers.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Skipped &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; pages due &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; buffer pins, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; frozen pages.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; pages &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; entirely empty.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CPU: &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, &lt;span style="color:#66d9ef"&gt;system&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s, elapsed: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; s.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;VACUUM&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;idx_tab2_a removed 10000 row versions in 10 pages, 7 index pages were deleted.
Table tab2 removed 10000 row versions in 173 pages.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--First page of the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab2&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-----------+--------+--------+-------+-----------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;45&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Last page of the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tab2&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+----+-----------+--------+--------+-------+-----------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;24&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;509&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9999&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--First index page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;NOTICE: page &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; deleted
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------------+---------+-------+------+------+------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;4294967295&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Last index page
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, itemlen, nulls, vars, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_a&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; itemlen &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nulls &lt;span style="color:#f92672"&gt;|&lt;/span&gt; vars &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------+---------+-------+------+------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;All line pointers for dead table tuples were marked UNUSED, data was cleaned, and only one live tuple remains in NORMAL state. The table still has the same number of pages.&lt;/p&gt;
&lt;p&gt;All dead index entries (dead=t) were cleaned. Live index entries were shifted within index pages (the last page&amp;rsquo;s index entry originally had itemoffset != 1). All emptied index pages were marked as deleted. These deleted index pages still exist, in a half-dead state.&lt;/p&gt;
&lt;p&gt;From the nbtree README on &amp;ldquo;Deleting entire pages during VACUUM&amp;rdquo; (the original is quite long; I&amp;rsquo;ve excerpted the key parts):&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;We consider deleting an entire page from the btree only when it&amp;rsquo;s become
completely empty of items.
Page deletion always begins from an empty leaf page. An
internal page can only be deleted as part of deleting an entire subtree.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;An entire page is only considered for deletion when the index page is completely empty. Deletion always starts from leaf nodes; non-leaf nodes are only deleted when deleting an entire subtree.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Deleting a leaf page is a two-stage process.&lt;br&gt;
In the first stage, the page
is unlinked from its parent, and marked as half-dead.
In the second-stage, the half-dead leaf page is unlinked from its siblings.
We first lock the left sibling (if any) of the target, the target page
itself, and its right sibling (there must be one) in that order. Then we
update the side-links in the siblings, and mark the target page deleted.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Deleting a leaf page has two stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Unlink from the parent — the leaf page is now in half-dead state&lt;/li&gt;
&lt;li&gt;Unlink from left and right siblings — the leaf page is now in deleted state&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;&lt;p&gt;A deleted page cannot be recycled immediately, since there may be other
processes waiting to reference it (ie, search processes that just left the
parent, or scans moving right or left from one of the siblings). These
processes must be able to observe a deleted page for some time after the
deletion operation, in order to be able to at least recover from it (they
recover by moving right, as with concurrent page splits). Searchers never
have to worry about concurrent page recycling.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Because other processes may still be using the deleted page, VACUUM cannot immediately recycle these index pages.&lt;/p&gt;
&lt;p&gt;This description matches what we observed.&lt;/p&gt;
&lt;p&gt;Although after VACUUM, the index still has the same number of pages:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reltuples 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_tab2_a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tab2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;173&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The index scan no longer needs to access deleted pages:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_tab2_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;109&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;011&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;012&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;056&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;025&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before VACUUM, shared hit=10. After VACUUM, the number of index pages hasn&amp;rsquo;t changed — still 10, with 8 pages deleted but not directly recycled, so shared hit=2. Why 2 is easy to understand: &amp;ldquo;meta page&amp;rdquo; + &amp;ldquo;the one surviving leaf page.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Placing deleted pages in the FSM
 &lt;div id="placing-deleted-pages-in-the-fsm" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#placing-deleted-pages-in-the-fsm" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;Recycling a page is decoupled from page deletion. A deleted page can only
be put in the FSM to be recycled once there is no possible scan or search
that has a reference to it; until then, it must stay in place with its
sibling links undisturbed, as a tombstone that allows concurrent searches
to detect and then recover from concurrent deletions (which are rather
like concurrent page splits to searchers)&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;strong&gt;What is &amp;ldquo;Placing deleted pages in the FSM&amp;rdquo;?&lt;/strong&gt; After an index page is deleted, it isn&amp;rsquo;t directly recycled. During index splits or new page allocation, it&amp;rsquo;s hard to find deleted pages for reuse. Placing deleted pages in the FSM puts these recyclable pages into the index&amp;rsquo;s corresponding FSM file, making it easy to find available free pages.&lt;/p&gt;
&lt;p&gt;As mentioned earlier, during the first VACUUM, those deleted pages are unlinked but still occupy space. Before PG14:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;We implement the technique by waiting until all active snapshots and
registered snapshots as of the page deletion are gone&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;One condition for deletion: all active snapshots and snapshots related to the deleted pages must have ended. So long transactions definitely affect placing.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Placing an already-deleted page in the FSM to be recycled when needed
doesn&amp;rsquo;t actually change the state of the page. The page will be changed
whenever it is subsequently taken from the FSM for reuse. The deleted
page&amp;rsquo;s contents will be overwritten by the split operation (it will become
the new right sibling page).&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Additionally, putting an already-deleted page into the FSM file doesn&amp;rsquo;t change the page&amp;rsquo;s state — this is just to quickly locate available free pages.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Prior to PostgreSQL 14, VACUUM would only place &lt;em&gt;old&lt;/em&gt; deleted pages that
it encounters during its linear scan (pages deleted by a previous VACUUM
operation) in the FSM. Newly deleted pages were never placed in the FSM,
because that was assumed to &lt;em&gt;always&lt;/em&gt; be unsafe.
PostgreSQL 14 added the ability for VACUUM to consider if it&amp;rsquo;s possible to
recycle newly deleted pages at the end of the full index scan where the
page deletion took place&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Before PG14, deleted pages produced by the first VACUUM were not placed in the FSM. Only &amp;ldquo;old&amp;rdquo; deleted pages would be placed in the FSM file. Starting from PG14, the first VACUUM also considers placing deleted pages in the FSM.&lt;/p&gt;
&lt;p&gt;Test (my version is PG13):&lt;/p&gt;
&lt;p&gt;The tab2 test above just ran one VACUUM. Although deleted pages were produced, the index has no corresponding FSM file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_tab2_a&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16437&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16437&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 81920 Apr 5 11:04 base/16384/16437&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now run VACUUM again:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; tab2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;lzlhost &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16437&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 81920 Apr 5 11:04 base/16384/16437
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 24576 Apr 5 15:52 base/16384/16437_fsm&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The index immediately generated an FSM file.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Flowchart: Index Bloat and Cleanup
 &lt;div id="flowchart-index-bloat-and-cleanup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#flowchart-index-bloat-and-cleanup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Please note:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The diagram below does not include table FSM/VM information&lt;/li&gt;
&lt;li&gt;The diagram below does not include deduplication information&lt;/li&gt;
&lt;li&gt;Version is PG13&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e9c38bb44b34.png" alt="Insert image description here" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;fillfactor
 &lt;div id="fillfactor" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fillfactor" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Above we covered various kernel-supported methods for reducing index bloat. Beyond these approaches that require little active participation, you can also adjust table and index fillfactor to control bloat.&lt;/p&gt;
&lt;p&gt;Fillfactor is essentially the waterline for tables or indexes. When &lt;strong&gt;INSERTING&lt;/strong&gt; data, once the page reaches the fillfactor line, insertion moves to the next page. Fillfactor is designed to leave room for UPDATE operations, preventing UPDATE from frequently seeking new pages.&lt;/p&gt;
&lt;p&gt;Although both tables and indexes have fillfactor with the same goal (accommodating UPDATE), the details differ significantly:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Tables&lt;/strong&gt;: If a table page still has space, UPDATE can happen within that page without needing to request a new page or go to another page with free space. Moreover, due to PostgreSQL&amp;rsquo;s unique HOT feature, in-page updates don&amp;rsquo;t update indexes, which naturally slows index bloat.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Indexes&lt;/strong&gt;: Different data rows or cross-page updates to the same row generate new index entries. Fillfactor leaves headroom in index pages, greatly reducing index split problems.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, fillfactor settings are closely tied to your workload. If data is like logs — monotonically increasing with zero updates — then setting both table and index fillfactor to 100 is reasonable. But most production tables have updates, and table/index fillfactor should not be 100. For frequent UPDATE workloads, fillfactor should be set even lower.&lt;/p&gt;
&lt;p&gt;However, PostgreSQL&amp;rsquo;s default fillfactor values are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Table default fillfactor=100&lt;/li&gt;
&lt;li&gt;Index default fillfactor=90&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With table fillfactor=100, HOT is completely unusable! Any UPDATE immediately seeks a new data page and creates a new index entry in the index&amp;rsquo;s 10% headroom. Eventually, update-heavy workloads constantly update indexes, and even 90 fillfactor on the index can&amp;rsquo;t hold up, leading to index splits&amp;hellip;&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a fillfactor test — two tables differ only in fillfactor, updating the same amount of data, comparing the final shared hit difference:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab4(a bigserial,b char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab4_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab4(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab4_a &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (deduplicate_items&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;); &lt;span style="color:#75715e"&gt;--disable index deduplication
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab4 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;storage&lt;/span&gt; PLAIN; &lt;span style="color:#75715e"&gt;--disable toast
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab4 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (autovacuum_enabled &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;); &lt;span style="color:#75715e"&gt;--disable autovacuum&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--tab5 has the same definition as tab4, except table and index fillfactor are adjusted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tab5 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (fillfactor&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_tab5_a &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; (fillfactor&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tab4(b) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;lllllllllll&amp;#39;&lt;/span&gt;); 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Repeatedly update one row
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; LOOP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tab4 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;md5(i::text) &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; LOOP; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Primary key query with default fillfactor
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab4 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab4 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;412&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;894&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;895&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: exact&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;174&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; idx_tab4_a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;023&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;023&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;173&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;057&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;913&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Primary key query with lowered fillfactor
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab5 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tab5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;41&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4012&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;367&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;369&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: exact&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1434&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; idx_tab5_a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;195&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;195&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1429&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;059&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;390&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After lowering fillfactor, the reduction in shared hits is very significant, and Execution Time improves several times over. In fact, both data pages and index pages decreased.&lt;/p&gt;
&lt;p&gt;So, on update-heavy production tables, lowering table and index fillfactor can mitigate bloat problems.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Although index bloat always accompanies table bloat, their principles differ. HOT doesn&amp;rsquo;t update index entries; cross-page updates create new index entries.&lt;/p&gt;
&lt;p&gt;Lowering table and index fillfactor can slow bloat in update-heavy production tables, ultimately also slowing down SQL queries like primary key lookups.&lt;/p&gt;
&lt;p&gt;There are also several kernel-level features for improving index space efficiency:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cleaning dead index entries during index scans (index tuple deletion)&lt;/li&gt;
&lt;li&gt;Cleaning dead index entries during index splits (Bottom-Up index tuple deletion)&lt;/li&gt;
&lt;li&gt;Vacuum marking pages of entirely dead index entries (Deleting entire pages during VACUUM)&lt;/li&gt;
&lt;li&gt;Quickly locating recycled index pages during index splits (Placing deleted pages in the FSM)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;references
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;src/backend/access/nbtree/README
&lt;a href="https://mp.weixin.qq.com/s/GBN7dFQU72BfzvLSzlLmYA" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/GBN7dFQU72BfzvLSzlLmYA&lt;/a&gt;
&lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782857?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522171221125016800182737655%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&amp;amp;request_id=171221125016800182737655&amp;amp;biz_id=0&amp;amp;utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-130782857-null-null.nonecase&amp;amp;utm_term=lp_flags&amp;amp;spm=1018.2226.3001.4450" target="_blank" rel="noreferrer"&gt;pg事务：事务相关元组结构&lt;/a&gt;
&lt;a href="https://www.cybertec-postgresql.com/en/killed-index-tuples/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/killed-index-tuples/&lt;/a&gt;
&lt;a href="https://www.cybertec-postgresql.com/en/index-bloat-reduced-in-postgresql-v14/?spm=a2c6h.12873639.article-detail.8.2f153438mIV8JK" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/index-bloat-reduced-in-postgresql-v14/?spm=a2c6h.12873639.article-detail.8.2f153438mIV8JK&lt;/a&gt;
&lt;a href="https://www.cybertec-postgresql.com/en/b-tree-index-improvements-in-postgresql-v12/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/b-tree-index-improvements-in-postgresql-v12/&lt;/a&gt;
&lt;a href="https://www.cybertec-postgresql.com/en/b-tree-index-deduplication/" target="_blank" rel="noreferrer"&gt;https://www.cybertec-postgresql.com/en/b-tree-index-deduplication/&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Getting Started with HikariCP Connection Pool</title><link>https://lastdba.com/en/2024/08/12/getting-started-with-hikaricp-connection-pool/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/getting-started-with-hikaricp-connection-pool/</guid><description>&lt;h2 class="relative group"&gt;A Brief Introduction to HikariCP
 &lt;div id="a-brief-introduction-to-hikaricp" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-brief-introduction-to-hikaricp" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&amp;ldquo;Hikari&amp;rdquo; means &amp;ldquo;light&amp;rdquo; in Japanese — HikariCP aims to be a Connection Pool as light and fast as light. This nearly Java-only middleware connection pool is extremely lightweight and performance-focused. HikariCP is now the default connection pool for Spring Boot, and with the proliferation of Spring Boot and microservices, HikariCP usage continues to grow.&lt;/p&gt;
&lt;p&gt;On the HikariCP GitHub homepage, there&amp;rsquo;s a performance comparison:



&lt;img src="https://lastdba.com/img/csdn/3da983db32ca.png" alt="在这里插入图片描述" /&gt;
（https://github.com/brettwooldridge/HikariCP-benchmark）&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;A Brief Introduction to HikariCP
 &lt;div id="a-brief-introduction-to-hikaricp" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-brief-introduction-to-hikaricp" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&amp;ldquo;Hikari&amp;rdquo; means &amp;ldquo;light&amp;rdquo; in Japanese — HikariCP aims to be a Connection Pool as light and fast as light. This nearly Java-only middleware connection pool is extremely lightweight and performance-focused. HikariCP is now the default connection pool for Spring Boot, and with the proliferation of Spring Boot and microservices, HikariCP usage continues to grow.&lt;/p&gt;
&lt;p&gt;On the HikariCP GitHub homepage, there&amp;rsquo;s a performance comparison:



&lt;img src="https://lastdba.com/img/csdn/3da983db32ca.png" alt="在这里插入图片描述" /&gt;
（https://github.com/brettwooldridge/HikariCP-benchmark）&lt;/p&gt;
&lt;p&gt;It appears to crush all other database connection pool middleware. However, this performance comparison is somewhat dated and lacks a comparison with Alibaba&amp;rsquo;s homegrown pinnacle connection pool, Druid. I briefly checked &lt;a href="https://github.com/alibaba/druid" target="_blank" rel="noreferrer"&gt;Druid&amp;rsquo;s&lt;/a&gt; GitHub page — it actually has slightly more stars than HikariCP. Druid is clearly stronger in terms of functionality. As for which has better performance, it even sparked &lt;a href="https://github.com/brettwooldridge/hikaricp/issues/232" target="_blank" rel="noreferrer"&gt;a spat between experts&lt;/a&gt;, and I haven&amp;rsquo;t seen any rigorous performance comparison report yet. But that&amp;rsquo;s not the focus of this article&amp;hellip; this article is just to get a basic understanding of HikariCP.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Key Connection Pool Parameters
 &lt;div id="key-connection-pool-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#key-connection-pool-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;There aren&amp;rsquo;t that many parameters. Let&amp;rsquo;s pick the important ones:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Parameter&lt;/th&gt;
 &lt;th&gt;Meaning&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;minimumIdle&lt;/td&gt;
 &lt;td&gt;This property controls the minimum number of idle connections HikariCP tries to maintain in the pool. If the number of idle connections drops below this value and the total number of connections in the pool is less than maximumPoolSize, HikariCP will do its best to quickly and efficiently add additional connections. However, for maximum performance and responsiveness to peak demand, we recommend not setting this value and instead letting HikariCP act as a fixed-size connection pool. Default: same as maximumPoolSize.&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;maximumPoolSize&lt;/td&gt;
 &lt;td&gt;This property controls the maximum size the pool can reach, including both idle and in-use connections. Basically, this value determines the upper limit of actual connections to the database backend. A reasonable value is best determined by your execution environment. When the pool reaches this size and no idle connections are available, calls to getConnection() will block until timeout after connectionTimeout milliseconds. Default: 10&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;maxLifetime&lt;/td&gt;
 &lt;td&gt;This property controls the maximum lifetime of connections in the pool. A connection in use will never be retired — it is only removed when closed. To avoid mass connection eviction in the pool, this property applies a slight negative attenuation to each connection. We strongly recommend setting this value, and it should be a few seconds shorter than any database or infrastructure-imposed connection time limit. A value of 0 means no maximum lifetime (infinite lifetime), subject to idleTimeout constraints. Minimum allowed: 30000ms (30 seconds). Default: 1800000 (30 minutes).&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;idleTimeout&lt;/td&gt;
 &lt;td&gt;This property controls the maximum time a connection is allowed to sit idle in the pool. This setting only applies when minimumIdle is defined as less than maximumPoolSize. Once the pool reaches minimumIdle connections, idle connections are not retired. Whether a connection is considered idle and retired has a maximum variation of +30 seconds, with an average variation of +15 seconds. A connection is never considered idle and retired before this timeout. A value of 0 means idle connections are never removed from the pool. Minimum allowed: 10000ms (10 seconds). Default: 600000 (10 minutes).&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;keepaliveTime&lt;/td&gt;
 &lt;td&gt;This property controls how frequently HikariCP will attempt to keep a connection alive to prevent it from timing out due to database or network infrastructure. This value must be less than maxLifetime. The &amp;ldquo;keepalive&amp;rdquo; operation only occurs on idle connections. Minimum allowed: 30000ms (30 seconds), but the ideal value is in the range of a few minutes. Default: 0 (disabled).&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The keepaliveTime parameter should be set lower than the database idle connection timeout, TCP idle connection timeout, and all other infrastructure idle timeouts. For PostgreSQL, HikariCP&amp;rsquo;s keepaliveTime should be set to less than PG&amp;rsquo;s &lt;code&gt;idle_in_transaction_session_timeout&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Clearly, maximumPoolSize represents the maximum number of connections to the database. Of course, in general, the actual number of connections in the database won&amp;rsquo;t always stay at maximumPoolSize because the application can&amp;rsquo;t run at peak load from start to finish. Even after a request peak passes, those idle connections should be released after some time according to idleTimeout or maxLifetime settings. To ensure database availability, this value should be set somewhat lower than the database&amp;rsquo;s maximum connections. For PostgreSQL, maximumPoolSize should be set to less than PG&amp;rsquo;s &lt;code&gt;max_connections&lt;/code&gt;. There&amp;rsquo;s room for tuning this parameter, which we&amp;rsquo;ll discuss below.&lt;/p&gt;
&lt;p&gt;minimumIdle is the minimum number of idle connections. For example, if minimumIdle=100 and the database has 10 active sessions, theoretically the total connections in the database should be 100+10. Due to possible connection storms, the actual database connections might be slightly more than active+minimumIdle, but certainly less than maximumPoolSize.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Why are database connections far greater than minimumIdle?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Theoretically, total database connections should only be slightly more than minimumIdle. However, from my actual observation of multi-node connection pool scenarios, even with only 10+ active connections, total database connections far exceed minimumIdle. Observing min(backend_start) and min(state_change) in pg_stat_activity, they stay around maxLifetime, indicating that connection recycling is working. It seems new requests always prefer to establish new connections rather than reuse existing idle ones. Personally, I suspect multi-node deployment is one reason — each node has a low minimumIdle, and some component nodes may have more requests, with instantaneous request counts exceeding minimumIdle, thus creating new connections. Second, it&amp;rsquo;s related to the maxLifetime parameter — maxLifetime&amp;rsquo;s purpose is to rotate connections, releasing those constantly in use. This means used connections need time to be released and ideally shouldn&amp;rsquo;t be reused to avoid extending the release cycle.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Connection Pool Sizing
 &lt;div id="connection-pool-sizing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#connection-pool-sizing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Impact of Excessive Connections
 &lt;div id="impact-of-excessive-connections" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#impact-of-excessive-connections" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;In the database world, &amp;ldquo;as the number of database connections increases, database performance always degrades to some extent.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;For example, Oracle&amp;rsquo;s connection count impact on performance — refer to &lt;a href="https://www.youtube.com/watch?v=_C77sBcAtSQ" target="_blank" rel="noreferrer"&gt;this video&lt;/a&gt;. With unchanged resource configuration and JDBC concurrency, reducing connections from 2048 to 1024 halved the request response time; reducing to 96 connections dropped response time by tens of times!&lt;/p&gt;

&lt;h3 class="relative group"&gt;What&amp;rsquo;s the Right Number of Connections?
 &lt;div id="whats-the-right-number-of-connections" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#whats-the-right-number-of-connections" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;Unless you have a database server that has 1000 cores, it is very unlikely that you really want a maximumPoolSize of 2000.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Unless your database has 1000 cores, you shouldn&amp;rsquo;t have 2000 connections.&lt;/p&gt;
&lt;p&gt;At the most basic level, the database connection count should be set to the number of CPU cores — this achieves maximum CPU performance mode. But this isn&amp;rsquo;t the full picture. Database consumption isn&amp;rsquo;t just on CPU, but also on disk and network (memory too, but with relatively less impact). For example, disk reads/writes also take time, and the CPU must wait for disk data to return before proceeding. During I/O wait periods (which can be quite long), it&amp;rsquo;s better for the CPU not to be idle but to serve other processes. Therefore, based on waiting times for disk and other devices, the database connection count should ideally be higher than the number of CPU cores.&lt;/p&gt;
&lt;p&gt;Due to SSD and other disk performance improvements, disk access is now very fast — meaning I/O wait times have decreased, implying connection counts should be tuned even lower.&lt;/p&gt;
&lt;p&gt;Tuning too low fails to fully utilize CPU; tuning too high degrades database performance. So what&amp;rsquo;s the right number? HikariCP provides this formula:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;connections = ((core_count * 2) + effective_spindle_count)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Where core_count should not count hyperthreading; effective_spindle_count is the spindle count — if the active dataset is fully cached, effective_spindle_count is zero; as cache hit rate decreases, it should approach the actual spindle count. There&amp;rsquo;s no established formula for SSDs yet, but it&amp;rsquo;s certainly less than the above maximum. Of course, these are all theoretical values — real-world situations are more complex, e.g., long connection issues. See &lt;a href="https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing" target="_blank" rel="noreferrer"&gt;About Pool Sizing&lt;/a&gt; for details.&lt;/p&gt;
&lt;p&gt;Even with 10,000 frontend users, the connection pool cannot be 10,000 — even 1,000 is too many. A smaller connection count, with remaining requests waiting in the pool queue, is the best way to maximize database and CPU performance. See the formula above for connection count settings.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Fixed Pool
 &lt;div id="fixed-pool" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fixed-pool" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Fixed pool is a concept advocated by HikariCP&amp;rsquo;s author Brett Wooldridge to solve the connection storm problem. The concept is already mentioned in the minimumIdle parameter description:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;For maximum performance and responsiveness to peak demand, we recommend not setting minimumIdle and instead letting HikariCP act as a fixed-size connection pool. Default: same as maximumPoolSize.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Setting minimumIdle=maximumPoolSize creates a fixed-size connection pool. minimumIdle&amp;rsquo;s default value equals maximumPoolSize.&lt;/p&gt;
&lt;p&gt;As early as 2014, Brett Wooldridge mentioned this concept — see the &lt;a href="https://www.postgresql.org/message-id/DF286FBF-D1F5-4A10-88AD-EDD5D2AFAABD%40gmail.com" target="_blank" rel="noreferrer"&gt;PG community mailing list&lt;/a&gt;. This passage is important, so I&amp;rsquo;ll translate it verbatim:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;In my experience, even pools that maintain a minimum number of idle connections are problematic in responding to burst demand. If you have a pool with a maximum of 30 connections and a target of 10 minimum idle connections, a burst demand requiring 20 connections means the pool can immediately satisfy 10, but then must try to establish another 10 connections before the application&amp;rsquo;s connection request reaches connectionTimeout. This in turn creates burst demand on the database, slowing down not just connection establishment itself but also transactions that might actually be returning connections to the pool.&lt;/p&gt;
&lt;p&gt;Now, if your peak is 100 connections and your median is 50, this doesn&amp;rsquo;t matter. But I know many workloads where the peak is 1000 and the median is 25 — in such cases you&amp;rsquo;d want to gradually reduce idle connections.&lt;/p&gt;
&lt;p&gt;Ultimately, we adopted a maxPoolSize + minIdle model, where by default they are equal (fixed pool).&lt;/p&gt;
&lt;p&gt;While I don&amp;rsquo;t doubt that such workloads (1000 active connections) exist, if someone is actually doing this, I&amp;rsquo;d love to hear their reasoning. Unless they have over 128 CPU cores and solid-state storage, they&amp;rsquo;re basically wasting effort.&lt;/p&gt;
&lt;p&gt;This also means that even if the pool size is fixed, you want to rotate actual sessions in and out so they don&amp;rsquo;t hang onto maximum virtual memory indefinitely.&lt;/p&gt;
&lt;p&gt;We do this with a maxLifeTime setting to rotate these connections.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In real scenarios, fixed pool&amp;rsquo;s protection against connection storm impact is visible. Under fixed pool, when the database&amp;rsquo;s instantaneous active connections spike, the idle connection count drops but the total connection count remains unchanged, and request response time is minimally affected. If maximumPoolSize is set to a value higher than minimumIdle, a connection storm can cause many new sessions to be created instantly, and new session creation is very resource-intensive — this significantly increases request response time.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Connection Leak Case Study
 &lt;div id="connection-leak-case-study" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#connection-leak-case-study" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since I&amp;rsquo;m not a connection pool expert, I&amp;rsquo;ll just summarize some recently found connection leak information here.&lt;/p&gt;
&lt;p&gt;Connection leaks exhibit the following symptoms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;Connection is not available&amp;rdquo; exception. Connection leaks, pool saturation, or the database being overwhelmed by excessive active sessions — new requests error out after exceeding &lt;code&gt;connectionTimeout&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Growth of active connections. Database monitoring clearly shows an increase in active sessions.&lt;/li&gt;
&lt;li&gt;Application logs. Application logs also show many connection requests, including active session information.&lt;/li&gt;
&lt;li&gt;Database views and logs. &lt;code&gt;pg_stat_activity&lt;/code&gt; shows all session states and specific SQL, and logs show new connection authentication information.&lt;/li&gt;
&lt;li&gt;HikariCP leak detection. Requires enabling &lt;code&gt;leakDetectionThreshold&lt;/code&gt;. HikariCP can detect connection leaks — this parameter is off by default.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For locating connection leaks, you should:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Check application logs, especially around the time the problem first occurred.&lt;/li&gt;
&lt;li&gt;Have a proper monitoring system.&lt;/li&gt;
&lt;li&gt;Be proficient with debug, trace, and other HikariCP settings.&lt;/li&gt;
&lt;li&gt;Set the &lt;code&gt;leakDetectionThreshold&lt;/code&gt; parameter.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Possible causes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Misuse of streaming responses;&lt;/li&gt;
&lt;li&gt;Misuse of raw connections;&lt;/li&gt;
&lt;li&gt;Prolonged operations within &lt;code&gt;@Transactional&lt;/code&gt; method (such as network invocation).&lt;/li&gt;
&lt;li&gt;Configuration errors, &lt;a href="https://mkyong.com/jdbc/hikaripool-1-connection-is-not-available-request-timed-out-after-30002ms/" target="_blank" rel="noreferrer"&gt;reference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Virtual threads, &lt;a href="https://github.com/brettwooldridge/HikariCP/issues/2151" target="_blank" rel="noreferrer"&gt;reference&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://github.com/brettwooldridge/HikariCP" target="_blank" rel="noreferrer"&gt;https://github.com/brettwooldridge/HikariCP&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/brettwooldridge/HikariCP/issues/2148" target="_blank" rel="noreferrer"&gt;https://github.com/brettwooldridge/HikariCP/issues/2148&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing" target="_blank" rel="noreferrer"&gt;https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blogs.oracle.com/opal/post/always-use-connection-pools" target="_blank" rel="noreferrer"&gt;https://blogs.oracle.com/opal/post/always-use-connection-pools&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mkyong.com/jdbc/hikaripool-1-connection-is-not-available-request-timed-out-after-30002ms/" target="_blank" rel="noreferrer"&gt;https://mkyong.com/jdbc/hikaripool-1-connection-is-not-available-request-timed-out-after-30002ms/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://medium.com/@eremeykin/how-to-deal-with-hikaricp-connection-leaks-part-1-1eddc135b464" target="_blank" rel="noreferrer"&gt;https://medium.com/@eremeykin/how-to-deal-with-hikaricp-connection-leaks-part-1-1eddc135b464&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://medium.com/@eremeykin/how-to-deal-with-hikaricp-connection-leaks-part-2-847a9629627f" target="_blank" rel="noreferrer"&gt;https://medium.com/@eremeykin/how-to-deal-with-hikaricp-connection-leaks-part-2-847a9629627f&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>History of Transactions and SSI — PostgreSQL Database Technology Summit Chengdu Stop Sharing</title><link>https://lastdba.com/en/2024/08/12/history-of-transactions-and-ssi-postgresql-database-technology-summit-chengdu-stop-sharing/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/history-of-transactions-and-ssi-postgresql-database-technology-summit-chengdu-stop-sharing/</guid><description>&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;PostgreSQL Database Technology Summit Chengdu Stop
 &lt;div id="postgresql-database-technology-summit-chengdu-stop" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-database-technology-summit-chengdu-stop" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Recently (June 17, 2023), the &amp;ldquo;PostgreSQL Database Technology Summit Chengdu Stop&amp;rdquo; organized by the PostgreSQL branch of the China Open Source Software Promotion Alliance was successfully held. I had the honor of participating as a speaker and gained a lot from it.



&lt;img src="https://lastdba.com/img/csdn/09a770e6512b.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(Summit review and all PPT downloads: &lt;a href="https://mp.weixin.qq.com/s/Gby7uHVV3bR-HvROZCg46Q" target="_blank" rel="noreferrer"&gt;PPT downloads are here | PostgreSQL Technology Summit Chengdu Stop Review&lt;/a&gt;)&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Preface
 &lt;div id="preface" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#preface" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;PostgreSQL Database Technology Summit Chengdu Stop
 &lt;div id="postgresql-database-technology-summit-chengdu-stop" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-database-technology-summit-chengdu-stop" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Recently (June 17, 2023), the &amp;ldquo;PostgreSQL Database Technology Summit Chengdu Stop&amp;rdquo; organized by the PostgreSQL branch of the China Open Source Software Promotion Alliance was successfully held. I had the honor of participating as a speaker and gained a lot from it.



&lt;img src="https://lastdba.com/img/csdn/09a770e6512b.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;(Summit review and all PPT downloads: &lt;a href="https://mp.weixin.qq.com/s/Gby7uHVV3bR-HvROZCg46Q" target="_blank" rel="noreferrer"&gt;PPT downloads are here | PostgreSQL Technology Summit Chengdu Stop Review&lt;/a&gt;)&lt;/p&gt;

&lt;h3 class="relative group"&gt;My Sharing
 &lt;div id="my-sharing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#my-sharing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;My technical sharing topic was: &lt;strong&gt;Database History and SSI&lt;/strong&gt;.
I&amp;rsquo;ve noticed that many domestic technical blogs describe transactions inaccurately, which can confuse beginners. Additionally, many colleagues aren&amp;rsquo;t very familiar with transaction history and SSI in PostgreSQL. This time, I collected and summarized accurate definitions of transactions, transaction history, and SSI theoretical foundations from Wikipedia, official SQL standards, and various papers. The main thread of the sharing goes from transaction history to anomalies not present in the SQL-92 standard, to how these anomalies can be eliminated, gradually progressing to how SSI is implemented in PostgreSQL.
The entire sharing is divided into 4 parts: Transaction Fundamentals, Transaction History, SSI Theoretical Knowledge, and SSI in PostgreSQL.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Transaction Fundamentals
 &lt;div id="transaction-fundamentals" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-fundamentals" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Before understanding transaction history and SSI, let&amp;rsquo;s review and revisit some basic transaction knowledge. The entire chapter will revolve around discussing transactions, and basic transaction knowledge will lead into the problems in transaction history.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What is a Transaction?
 &lt;div id="what-is-a-transaction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-transaction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Original meaning of transaction&lt;/strong&gt;: A transaction is an exchange, a deal. Exchange is the original meaning of transaction, and what we call transactions in databases comes from this word.
&lt;strong&gt;Database transaction&lt;/strong&gt;: &lt;em&gt;A transaction is the basic unit of work in a relational database&lt;/em&gt;. For example:
Deleting data from table A and inserting data into table B — we can wrap these two actions into one transaction. Both must complete. But due to unexpected factors, the transaction might fail or be canceled halfway through execution. In that case, all operations in the entire transaction must roll back to the state before the transaction — A doesn&amp;rsquo;t delete and B doesn&amp;rsquo;t insert.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ACID
 &lt;div id="acid" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#acid" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ACID is an important characteristic of database transactions. It determines whether a transaction is reliable and trustworthy.



&lt;img src="https://lastdba.com/img/csdn/cc73d604121e.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Atomicity&lt;/strong&gt;: All operations within a transaction either complete entirely or cancel entirely.
Like atoms in chemistry — indivisible and unsplittable. If a transaction encounters a problem midway and fails to execute, the entire transaction must roll back.
&lt;strong&gt;Consistency&lt;/strong&gt;: When a transaction completes, all data remains in a consistent state.
This definition is actually somewhat vague. Transactions generally operate on data, and the state of data in the database gets updated. Due to transaction operations, data transitions from one state to another. This state must be reasonable and legitimate — the data logic must be consistent with real-world logic. This might be abstract, so here&amp;rsquo;s an example: Say A has 100 yuan, B has 200 yuan, their combined total is 300 yuan. Now B transfers 100 yuan to A. Then A has 200 yuan, B has 100 yuan, and their combined total is still 300 yuan. Key point: &lt;em&gt;The data changes in this virtual world should remain consistent with real-world logic&lt;/em&gt;.
&lt;strong&gt;Isolation&lt;/strong&gt;: The result of executing multiple transactions concurrently must be the same as executing them separately one after another.
For example, with 2 transactions, executing them serially one after another must produce the same result as executing them in parallel. (This is the official understanding from Wikipedia and the definition in the SQL standard — please remember this definition, as it&amp;rsquo;s the focus of this article.)
&lt;strong&gt;Durability&lt;/strong&gt;: After a transaction completes, changes to data are permanent.
If updated data is placed in memory and disappears when the machine powers off, then it should go to disk. But is disk storage safe? What if the disk fails? We could have a high-availability architecture writing multiple copies of data. Extending further, we could have geographic-level disaster recovery. But if we push further — what if multiple regions all fail? From an architectural perspective, this question seems to have no answer. But from the &lt;em&gt;user&amp;rsquo;s perspective&lt;/em&gt;, it&amp;rsquo;s actually easier to understand. For example, when a user deposits money — they put the cash in, and their account should display that amount. This number is permanent for the user. The user believes that even if the sky falls, their account should have this number. That is the meaning of durability.&lt;/p&gt;

&lt;h3 class="relative group"&gt;ANSI SQL-92 Standard
 &lt;div id="ansi-sql-92-standard" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ansi-sql-92-standard" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/41ff840d82af.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;In 1992, the American National Standards Institute ANSI SQL-92 standard defined 4 isolation levels and 3 anomaly phenomena.
Although the database industry today mostly follows ISO international standards,



&lt;img src="https://lastdba.com/img/csdn/32fd2f2e70e0.png" alt="Image" /&gt;
this 1992 American standard had a huge impact on the database industry. I believe many database practitioners are familiar with the 4 isolation levels.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Isolation Levels in the SQL-92 Standard
 &lt;div id="isolation-levels-in-the-sql-92-standard" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#isolation-levels-in-the-sql-92-standard" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ANSI SQL-92 defines 4 isolation levels:



&lt;img src="https://lastdba.com/img/csdn/7b6b2fd5d336.png" alt="Image" /&gt;
Transaction isolation levels from high to low. Notice Serializable: when all transactions in the system execute in parallel, there is no difference from executing them serially — transactions do not affect each other. Doesn&amp;rsquo;t this resemble the definition of Isolation in ACID?
All 4 isolation levels can satisfy all-or-nothing execution of transactions. They only differ in their definitions of isolation. All isolation levels can have atomicity, consistency, and durability, but different isolation levels have different isolation characteristics. &lt;strong&gt;By definition, only Serializable fully satisfies ACID&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Anomaly Phenomena in the SQL-92 Standard
 &lt;div id="anomaly-phenomena-in-the-sql-92-standard" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#anomaly-phenomena-in-the-sql-92-standard" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The SQL-92 standard defines 3 anomaly phenomena. There are many definitions online, but many are not entirely accurate. Here we directly extract the definitions of the 3 anomaly phenomena from the &lt;em&gt;SQL-92 standard document&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/80b283e33a76.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dirty Read&lt;/strong&gt;: Transaction T1 updates a row. Transaction T2 can read this row before T1 commits. If T1 executes a rollback, T2 will have read a row that was never committed.
&lt;em&gt;Dirty reads have an obvious problem — the user may not know whether the money has actually arrived. Before the transaction completes, the user can query and see money transferred into the account, but if the transaction fails and rolls back for some reason, the money disappears again. This is hard for users to understand.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Non-repeatable Read&lt;/strong&gt;: Transaction T1 reads a row. Transaction T2 updates or deletes that row and commits. If T1 reads that row again, it will find the row has been changed or deleted.
&lt;strong&gt;Phantom Read&lt;/strong&gt;: Transaction T1 reads N rows matching certain conditions. Transaction T2 executes SQL that generates rows satisfying these conditions. When T1 reads again, it finds inconsistent row results.
&lt;em&gt;The difference between non-repeatable read and phantom read is: one is caused by other transactions updating or deleting leading to inconsistent reads within the same transaction; the other is caused by other transactions inserting leading to inconsistent reads within the same transaction.&lt;/em&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;SQL-92 Standard and PostgreSQL
 &lt;div id="sql-92-standard-and-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sql-92-standard-and-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2784c3889598.png" alt="Image" /&gt;
In the SQL-92 standard, isolation levels and anomaly phenomena have a stepped relationship. Except for Serializable which has no anomalies, each isolation level adds anomaly phenomena step by step. Now let&amp;rsquo;s look at the following table — this is the isolation levels and anomaly phenomena in PostgreSQL, which is &lt;strong&gt;different&lt;/strong&gt; from the SQL-92 standard.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why is PostgreSQL&amp;rsquo;s isolation level inconsistent with the SQL-92 standard?
 &lt;div id="why-is-postgresqls-isolation-level-inconsistent-with-the-sql-92-standard" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-is-postgresqls-isolation-level-inconsistent-with-the-sql-92-standard" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Why is Read Uncommitted inconsistent with the SQL-92 standard? Read Uncommitted is simply too strange. In relational databases, it&amp;rsquo;s hard to imagine a scenario for using Read Uncommitted. It severely violates transaction isolation. PostgreSQL treats &amp;ldquo;Read Uncommitted&amp;rdquo; as &amp;ldquo;Read Committed.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Why is Repeatable Read inconsistent with the SQL-92 standard? PostgreSQL implements MVCC (Multi-Version Concurrency Control) through snapshots. The Repeatable Read level in PostgreSQL is actually the Snapshot Isolation level, which doesn&amp;rsquo;t have the Phantom Read anomaly.&lt;/li&gt;
&lt;li&gt;Although the SQL-92 standard has far-reaching influence, many databases haven&amp;rsquo;t fully implemented it.&lt;/li&gt;
&lt;li&gt;The ANSI SQL-92 standard has vague definitions. The SQL-92 standard is very representative in the database industry — &lt;em&gt;&amp;ldquo;It&amp;rsquo;s good, but not good enough.&amp;rdquo;&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Transaction History
 &lt;div id="transaction-history" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#transaction-history" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;History of Transactions
 &lt;div id="history-of-transactions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#history-of-transactions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;To understand &lt;em&gt;&amp;ldquo;It&amp;rsquo;s good, but not good enough,&amp;rdquo;&lt;/em&gt; we need to review transaction history, going back 40 years.



&lt;img src="https://lastdba.com/img/csdn/9510b3d76415.png" alt="Image" /&gt;
Notice the timing of the SQL-92 standard and the &amp;ldquo;Critique of SQL-92.&amp;rdquo; Although the SQL-92 standard was &amp;ldquo;flawed,&amp;rdquo; it still had a profound impact on the database industry. Subsequently, after many serializability theories were proven, PostgreSQL became the first commercial database to implement SSI.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Critique of the SQL-92 Standard
 &lt;div id="critique-of-the-sql-92-standard" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#critique-of-the-sql-92-standard" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Shortly after the SQL-92 standard was released, some Microsoft engineers and academics critiqued it and proposed more isolation levels and anomaly phenomena.
Where the SQL-92 standard defined 4 isolation levels and 3 anomaly phenomena, the &amp;ldquo;Critique of SQL-92&amp;rdquo; had 6 isolation levels and 8 anomaly phenomena.



&lt;img src="https://lastdba.com/img/csdn/9e1cc8e57c9e.png" alt="Image" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;More isolation levels and anomaly phenomena appeared — they were not defined in ANSI SQL-92.&lt;/li&gt;
&lt;li&gt;Snapshot Isolation sits between Repeatable Read and Serializable. This is also one of the reasons why PostgreSQL&amp;rsquo;s Repeatable Read and Serializable look so similar.&lt;/li&gt;
&lt;li&gt;The Write Skew anomaly was identified. It occurs at the Snapshot Isolation level.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Isolation Levels of Popular Databases
 &lt;div id="isolation-levels-of-popular-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#isolation-levels-of-popular-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/de1258812b3c.png" alt="Image" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;MySQL at Serializable isolation level: reads acquire shared read locks on data, meaning reads block writes.&lt;/li&gt;
&lt;li&gt;Oracle can also set the Serializable isolation level and claims to support serializability, but it&amp;rsquo;s not true serializability — it&amp;rsquo;s just Snapshot Isolation.&lt;/li&gt;
&lt;li&gt;PostgreSQL supports Serializable. It implements serializability on top of Snapshot Isolation, fully named Serializable Snapshot Isolation (SSI), where reads and writes do not block each other.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can see the differences among the three — only PostgreSQL&amp;rsquo;s Serializable has real substance.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Why Did Oracle Deceive Us?
 &lt;div id="why-did-oracle-deceive-us" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-did-oracle-deceive-us" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;What did Oracle deceive us about? It passed off the &lt;em&gt;Snapshot Isolation&lt;/em&gt; isolation level as the &lt;em&gt;Serializable&lt;/em&gt; isolation level.
Why did this happen?
If we add Snapshot Isolation to the ANSI SQL-92 standard:



&lt;img src="https://lastdba.com/img/csdn/88c0c04ad181.png" alt="Image" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The SQL-92 standard defines fewer anomaly phenomena and doesn&amp;rsquo;t define Snapshot Isolation. By the SQL-92 standard&amp;rsquo;s view, Snapshot Isolation looks similar to Serializable.&lt;/li&gt;
&lt;li&gt;Most relational databases follow the SQL-92 standard, including Oracle. But when better standards later emerged, they didn&amp;rsquo;t make changes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Why Do Weak Isolation Levels Have Academic Problems but Few Serious Real-World Issues?
 &lt;div id="why-do-weak-isolation-levels-have-academic-problems-but-few-serious-real-world-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-do-weak-isolation-levels-have-academic-problems-but-few-serious-real-world-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Anomaly phenomena at non-serializable isolation levels generally require high concurrency to manifest. Low-concurrency databases are unlikely to encounter problems.&lt;/li&gt;
&lt;li&gt;When anomaly phenomena do occur, some applications may not notice them, or may detect anomalies but find them unimportant.&lt;/li&gt;
&lt;li&gt;Data might be anomalous, but the application simply returns an error and enters an anomaly handling routine.&lt;/li&gt;
&lt;li&gt;Costs are too high. Not only is the development cost of database serializable isolation levels high, but applications also need adaptation costs for serializability. Just understanding this complex theory is no easy task.&lt;/li&gt;
&lt;li&gt;High-level isolation loses some performance. Extensive modification work may be thankless — applications need to choose between &amp;ldquo;high concurrency&amp;rdquo; and &amp;ldquo;no anomaly phenomena.&amp;rdquo;&lt;/li&gt;
&lt;li&gt;Businesses develop based on mechanisms rather than rules. Businesses somewhat adapt to the anomaly phenomena of weak isolation levels, especially Read Committed.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;What&amp;rsquo;s the Point of Serializable?
 &lt;div id="whats-the-point-of-serializable" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#whats-the-point-of-serializable" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If weak isolation seems to work fine in the real world, what&amp;rsquo;s the point of Serializable? There is actually a point:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Although applications adapt to weak isolation levels, it doesn&amp;rsquo;t mean they truly understand them.&lt;/li&gt;
&lt;li&gt;Using Serializable, applications can greatly reduce concerns about data anomalies.&lt;/li&gt;
&lt;li&gt;Except for Serializable, all other isolation levels have their own anomaly phenomena and don&amp;rsquo;t fully satisfy ACID&amp;rsquo;s Isolation property.&lt;/li&gt;
&lt;li&gt;Serializable can eliminate anomaly phenomena — the &amp;ldquo;termites&amp;rdquo; — fully ensuring data safety.&lt;/li&gt;
&lt;li&gt;Serializable has been proven theoretically achievable.&lt;/li&gt;
&lt;li&gt;Some serializable implementations do significantly reduce concurrency, but there are other implementations with minimal concurrency impact. For example, Serializable Snapshot Isolation (SSI).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;SSI Theoretical Knowledge
 &lt;div id="ssi-theoretical-knowledge" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ssi-theoretical-knowledge" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After all that about transaction fundamentals and history, we finally arrive at the concept of SSI. But before understanding SSI, we need to understand two more concepts: Serializable and Snapshot Isolation.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Serializable
 &lt;div id="serializable" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#serializable" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/71f922c0363f.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Meaning of Serializable&lt;/strong&gt;
If each transaction itself is correct (satisfying certain integrity conditions), then any serial schedule including these transactions is correct (its transactions still satisfy their conditions): &amp;ldquo;Serial&amp;rdquo; means transactions don&amp;rsquo;t overlap in time and cannot interfere with each other — i.e., there exists complete isolation between them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Implementation of Serializable&lt;/strong&gt;
In early transaction development, Serializable was implemented through Strict Two-Phase Locking (S2PL), where reads and writes block each other until the transaction ends. This eliminated anomaly phenomena but S2PL lost high performance.
Besides S2PL, there are other ways to achieve serializability, such as Serializable Snapshot Isolation (SSI).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Significance of Serializable&lt;/strong&gt;
To ensure no anomalies, Serializable sacrifices some concurrency (varying by implementation approach), but it truly guarantees ACID isolation for data. That is to say, databases that haven&amp;rsquo;t implemented serializability don&amp;rsquo;t fully support ACID properties.
Serializable has been proven theoretically achievable, but the real database world is somewhat &amp;ldquo;abnormal.&amp;rdquo; In practice, Serializable is the highest transaction isolation level and is strongly recommended by academics and industry leaders, yet the vast majority of databases run at Read Committed or Snapshot Isolation levels.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Snapshot Isolation
 &lt;div id="snapshot-isolation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#snapshot-isolation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Definition of Snapshot Isolation&lt;/strong&gt;
Transactions executing under Snapshot Isolation operate on a snapshot of the database taken at the start of the transaction. When the transaction ends, it will only commit successfully if the values it updated haven&amp;rsquo;t been externally changed since the snapshot was taken.
As the name implies, Snapshot Isolation uses snapshots, which are widely used to implement MVCC, enabling multi-version concurrency mechanisms to support concurrent transaction execution by users.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Emergence of Snapshot Isolation&lt;/strong&gt;
ANSI SQL-92 did not define Snapshot Isolation (SI). This isolation level emerged as the database industry evolved. The 1992 ANSI SQL-92 standard was defined based on database locks, so there was no definition for the Snapshot Isolation level. It wasn&amp;rsquo;t proposed until the 1995 &amp;ldquo;Critique&amp;rdquo; appeared.&lt;/p&gt;

&lt;h3 class="relative group"&gt;SSI
 &lt;div id="ssi" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ssi" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8142817784c7.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Serializable Snapshot Isolation (SSI)&lt;/strong&gt;
Given the widespread use of Snapshot Isolation and the academic goal that databases should achieve the Serializable isolation level, Serializable Snapshot Isolation (SSI), as the name suggests, implements serializability on top of Snapshot Isolation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why SSI?&lt;/strong&gt;
Due to the vagueness of the ANSI SQL-92 standard, although it didn&amp;rsquo;t define Snapshot Isolation, many databases actually use it. And Snapshot Isolation also has some anomaly phenomena (including Write Skew). SSI emerged to address these anomaly phenomena.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Advantages of SSI over S2PL&lt;/strong&gt;
Traditional serializability is implemented through S2PL. Under S2PL, write operations block other transactions&amp;rsquo; reads and writes. Although it achieves serializability without Write Skew anomalies, it generates many lock conflicts, reducing concurrency performance. In contrast, MVCC implemented through snapshots has non-blocking reads and writes, with only write-write conflicts. SSI built on this foundation has much less impact on concurrency compared to traditional S2PL.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PostgreSQL Implements SSI&lt;/strong&gt;
PostgreSQL began implementing SSI in version 9.1, becoming the first commercial database to implement SSI.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Three Types of Dependencies
 &lt;div id="three-types-of-dependencies" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#three-types-of-dependencies" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bd6547d3dd94.png" alt="Image" /&gt;
&lt;strong&gt;Read-Write Dependency (wr):&lt;/strong&gt; Transaction T1 writes a version of a data item, and transaction T2 reads this version, meaning T1 precedes T2.
&lt;strong&gt;Write-Write Dependency (ww):&lt;/strong&gt; Transaction T1 writes a version of a data item, and transaction T2 replaces this version with a new one, meaning T1 precedes T2.
&lt;strong&gt;Read-Write Anti-dependency (rw):&lt;/strong&gt; Transaction T1 writes a version of a data item, and transaction T2 reads the version before this one, meaning T2 precedes T1.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Write Skew Theory
 &lt;div id="write-skew-theory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#write-skew-theory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When certain conflicts form a cycle, serialization anomalies occur. That is to say, some concurrently executing transactions are theoretically non-serializable. One of the more easily understood examples is Write Skew.
Write skew only occurs in the rw model — ww and wr won&amp;rsquo;t cause write skew — and transactions must be under concurrent conditions for it to appear.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/30043eef5317.png" alt="Image" /&gt;
Simple Write Skew: Transaction T1 has an rw anti-dependency on T2, and T2 also has an rw anti-dependency on T1. The concurrent execution of these two transactions is non-serializable.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Real-World Write Skew Problems
 &lt;div id="real-world-write-skew-problems" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#real-world-write-skew-problems" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Many real-world cases can produce Write Skew anomalies. Let&amp;rsquo;s use the classic black-and-white ball problem to understand Write Skew:



&lt;img src="https://lastdba.com/img/csdn/4dada0644aac.png" alt="Image" /&gt;
There are 4 balls in a bag: 2 white and 2 black. Now there are two transactions, P and Q. P changes all black balls to white, Q changes all white balls to black. There can be two serial executions: &amp;lt;P, Q&amp;gt; or &amp;lt;Q, P&amp;gt;. In both cases, the final result is 4 white balls or 4 black balls.
&lt;strong&gt;However&lt;/strong&gt;, Snapshot Isolation allows another result:
Transaction P takes out 2 black balls
Transaction Q takes out 2 white balls
Transaction P changes all black balls in hand to white and puts them back
Transaction Q changes all white balls in hand to black and puts them back
Now the bag still has 2 black balls and 2 white balls. This is impossible in any serial execution.
But this is valid under Snapshot Isolation: each transaction maintains a consistent view of the database, and its write set doesn&amp;rsquo;t overlap with any concurrent transaction&amp;rsquo;s write set, resulting in the white and black balls exchanging.&lt;/p&gt;
&lt;p&gt;We can also make the problem more concrete and practical. Here&amp;rsquo;s a rough example: Suppose I have several bank cards, half frozen and half unfrozen. At one terminal, I execute freezing all cards. At another terminal, I immediately execute unfreezing all cards. From an intent perspective, my cards should all be unfrozen. But a strange phenomenon occurs: previously frozen cards become unfrozen, and previously unfrozen cards become frozen. As a customer, I would be confused.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The black-and-white ball problem illustrates&lt;/strong&gt;: Snapshot Isolation execution results are inconsistent with Serializable execution results. Under Snapshot Isolation, a Write Skew anomaly occurs, and data results don&amp;rsquo;t match expectations.&lt;/p&gt;

&lt;h2 class="relative group"&gt;SSI in PostgreSQL
 &lt;div id="ssi-in-postgresql" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ssi-in-postgresql" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;How PostgreSQL Handles SSI
 &lt;div id="how-postgresql-handles-ssi" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-postgresql-handles-ssi" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;It&amp;rsquo;s actually simple — cancel the pivot transaction that forms the &amp;ldquo;dangerous structure.&amp;rdquo;
We first set the isolation level to Serializable for both. The table has some white balls and some black balls.&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;T1&lt;/th&gt;
 &lt;th&gt;T2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;set default_transaction_isolation = &amp;lsquo;serializable&amp;rsquo;;&lt;/td&gt;
 &lt;td&gt;set default_transaction_isolation = &amp;lsquo;serializable&amp;rsquo;;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;begin; update dots set color = &amp;lsquo;black&amp;rsquo; where color = &amp;lsquo;white&amp;rsquo;;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;begin; update dots set color = &amp;lsquo;white&amp;rsquo; where color = &amp;lsquo;black&amp;rsquo;;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;commit;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;commit;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;code&gt;ERROR: could not serialize access due to read/write dependencies among transactions DETAIL: Reason code: Canceled on identification as a pivot, during commit attempt. HINT: The transaction might succeed if retried.&lt;/code&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Transaction 1 changes all white to black, Transaction 2 changes all black to white, then both commit. The first transaction to commit succeeds, the second fails. The error says: &lt;em&gt;could not serialize access due to read/write dependencies among transactions, canceled on identification as a pivot. If you retry the transaction, it might succeed.&lt;/em&gt;
Of course it would succeed here — the other transaction has already completed, so one transaction alone cannot form a dependency cycle. At other isolation levels like Repeatable Read or Read Committed, these two transactions would execute without any error, running normally, but the data results would differ from SSI&amp;rsquo;s results.&lt;/p&gt;

&lt;h3 class="relative group"&gt;PostgreSQL SSI Implementation Optimizations
 &lt;div id="postgresql-ssi-implementation-optimizations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#postgresql-ssi-implementation-optimizations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL implements Serializable SSI on top of Snapshot Isolation and has made many optimizations to improve concurrency at high isolation levels. PostgreSQL&amp;rsquo;s SSI optimizations mainly include 3 points:
&lt;strong&gt;Safe Snapshots&lt;/strong&gt;: Read-only transactions that won&amp;rsquo;t create cyclic structures don&amp;rsquo;t need conflict detection, reducing checking overhead and memory burden.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Deferrable Transactions&lt;/strong&gt;: Deferrable transactions can be retried. When a &amp;ldquo;dangerous structure&amp;rdquo; is detected, the deferrable transaction is canceled and then attempted again. Deferrable transactions need to be explicitly declared.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Detection Granularity Escalation&lt;/strong&gt;: Multiple fine-grained locks can be combined into coarse-grained locks to reduce memory overhead.&lt;/p&gt;
&lt;p&gt;Optimization Results — Performance Benchmark Comparison:



&lt;img src="https://lastdba.com/img/csdn/5bba1b13a844.png" alt="Image" /&gt;
The green line is the Snapshot Isolation baseline. The blue line shows PostgreSQL&amp;rsquo;s SSI performance, which is already very close to Snapshot Isolation. The brown line is SSI without read-only transactions — all data-changing transactions — showing how much read-only transaction optimization improves performance. In typical business systems, read-only transactions outnumber change transactions. The red line is serializability implemented through Strict Two-Phase Locking — the performance is abysmal.&lt;/p&gt;
&lt;p&gt;The table below shows concurrency pressure and transaction failure rates. Since some transactions need to be canceled to break cycles, Serializable inevitably cancels more transactions than weak isolation. This table also shows that PostgreSQL&amp;rsquo;s SSI has far higher concurrency and transaction success rates than Strict Two-Phase Locking.&lt;/p&gt;
&lt;p&gt;Optimization Results — Request Volume and Failure Rate:



&lt;img src="https://lastdba.com/img/csdn/fe74c76c6333.png" alt="Image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Serializable can simplify system development problems. Developers don&amp;rsquo;t need to worry about transaction anomalies under concurrency, especially in today&amp;rsquo;s increasingly high-concurrency systems.&lt;/li&gt;
&lt;li&gt;PostgreSQL&amp;rsquo;s Serializable is clearly better than the Strict Two-Phase Locking model. Not only better performance, but also lower transaction abort probability.&lt;/li&gt;
&lt;li&gt;PostgreSQL is the first commercial database to implement SSI, while many traditional relational databases don&amp;rsquo;t support serializability at all. PostgreSQL has taken a big step forward.&lt;/li&gt;
&lt;li&gt;PostgreSQL not only implemented SSI but also made many optimizations on top of it, such as read-only transaction and memory optimizations, with significant results.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>How Does PG Access Basic System Tables Before pg_class Exists?</title><link>https://lastdba.com/en/2024/08/12/how-does-pg-access-basic-system-tables-before-pg_class-exists/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/how-does-pg-access-basic-system-tables-before-pg_class-exists/</guid><description>&lt;p&gt;How does the database access system tables before pg_class exists? This question can be divided into two stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Database cluster initialization — at this point no database exists at all, so how to construct and access system tables like pg_class is a problem.&lt;/li&gt;
&lt;li&gt;Private memory initialization of system tables. PG stores system table information in the local backend process. How does the backend load pg_class during initialization?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Initializing the Data Dictionary
 &lt;div id="initializing-the-data-dictionary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#initializing-the-data-dictionary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When the database hasn&amp;rsquo;t been initialized yet, it&amp;rsquo;s obviously impossible to access the data dictionary to initialize objects like database, pg_class, etc., because without a database you can&amp;rsquo;t &lt;code&gt;CREATE DATABASE&lt;/code&gt;, and without pg_class you can&amp;rsquo;t look up metadata information.&lt;/p&gt;</description><content:encoded>&lt;p&gt;How does the database access system tables before pg_class exists? This question can be divided into two stages:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Database cluster initialization — at this point no database exists at all, so how to construct and access system tables like pg_class is a problem.&lt;/li&gt;
&lt;li&gt;Private memory initialization of system tables. PG stores system table information in the local backend process. How does the backend load pg_class during initialization?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 class="relative group"&gt;Initializing the Data Dictionary
 &lt;div id="initializing-the-data-dictionary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#initializing-the-data-dictionary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When the database hasn&amp;rsquo;t been initialized yet, it&amp;rsquo;s obviously impossible to access the data dictionary to initialize objects like database, pg_class, etc., because without a database you can&amp;rsquo;t &lt;code&gt;CREATE DATABASE&lt;/code&gt;, and without pg_class you can&amp;rsquo;t look up metadata information.&lt;/p&gt;
&lt;p&gt;PG uses a special language in BKI files to initialize some data structures, then initializes a primitive database in bootstrap mode&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Compilation Phase: genbki.h &amp;amp; genbki.pl
 &lt;div id="compilation-phase-genbkih--genbkipl" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#compilation-phase-genbkih--genbkipl" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;src/include/catalog/genbki.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; genbki.h defines &lt;span style="color:#a6e22e"&gt;CATALOG&lt;/span&gt;(), BKI_BOOTSTRAP and related macros
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; so that the catalog header files can be read by the C compiler.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; (These same words are recognized by genbki.pl to build the BKI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; bootstrap file from these header files.)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;genbki.h&lt;/code&gt; is quite minimal — mainly macro definitions for catalog-related operations, as well as macros for the BKI bootstrap file. Data dictionary header files all include &lt;code&gt;genbki.h&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;genbki.pl&lt;/code&gt; reads the &lt;code&gt;.h&lt;/code&gt; table definition files from &lt;code&gt;/src/include/catalog&lt;/code&gt; during compilation (excluding &lt;code&gt;pg_*_d.h&lt;/code&gt;), and creates the &lt;code&gt;postgres.bki&lt;/code&gt; file and &lt;code&gt;pg_*_d.h&lt;/code&gt; header files.&lt;/p&gt;
&lt;p&gt;Taking pg_class as an example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@catalog]$ ll |grep pg_class
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r----- 1 postgres postgres 3682 Aug 6 2019 pg_class.dat
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lrwxrwxrwx 1 postgres postgres 86 Apr 8 20:31 pg_class_d.h -&amp;gt; /lzl/soft/postgresql-11.5/src/backend/catalog/pg_class_d.h
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw-r----- 1 postgres postgres 5219 Aug 6 2019 pg_class.h&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;pg_*_d.h&lt;/code&gt; header files are generated by &lt;code&gt;genbki.pl&lt;/code&gt;. All &lt;code&gt;pg_*_d.h&lt;/code&gt; files contain the following line:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;It has been GENERATED by src/backend/catalog/genbki.pl&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Each data dictionary has a struct &lt;code&gt;typedef struct FormData_*catalogname*&lt;/code&gt; for storing the row data of the data dictionary&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;, for example pg_class&amp;rsquo;s &lt;code&gt;FormData_pg_class&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;CATALOG&lt;/span&gt;(pg_class,&lt;span style="color:#ae81ff"&gt;1259&lt;/span&gt;,RelationRelationId) BKI_BOOTSTRAP &lt;span style="color:#a6e22e"&gt;BKI_ROWTYPE_OID&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt;,RelationRelation_Rowtype_Id) BKI_SCHEMA_MACRO
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* oid */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			oid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* class name */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	NameData	relname;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* OID of namespace containing this class */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			relnamespace &lt;span style="color:#a6e22e"&gt;BKI_DEFAULT&lt;/span&gt;(pg_catalog) &lt;span style="color:#a6e22e"&gt;BKI_LOOKUP&lt;/span&gt;(pg_namespace);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* OID of entry in pg_type for relation&amp;#39;s implicit row type, if any */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			reltype &lt;span style="color:#a6e22e"&gt;BKI_LOOKUP_OPT&lt;/span&gt;(pg_type);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* OID of entry in pg_type for underlying composite type, if any */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			reloftype &lt;span style="color:#a6e22e"&gt;BKI_DEFAULT&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#a6e22e"&gt;BKI_LOOKUP_OPT&lt;/span&gt;(pg_type);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* class owner */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			relowner &lt;span style="color:#a6e22e"&gt;BKI_DEFAULT&lt;/span&gt;(POSTGRES) &lt;span style="color:#a6e22e"&gt;BKI_LOOKUP&lt;/span&gt;(pg_authid);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* access-method-specific options */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	text		reloptions[&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;] &lt;span style="color:#a6e22e"&gt;BKI_DEFAULT&lt;/span&gt;(_null_);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* partition bound node tree */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	pg_node_tree relpartbound &lt;span style="color:#a6e22e"&gt;BKI_DEFAULT&lt;/span&gt;(_null_);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#endif
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;} FormData_pg_class;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;pg_class&amp;rsquo;s OID is hardcoded as 1259, and all fields are in the &lt;code&gt;FormData_pg_class&lt;/code&gt; struct.&lt;/p&gt;
&lt;p&gt;After initializing the struct for user data storage, the corresponding &lt;code&gt;.dat&lt;/code&gt; file is used to insert base data. pg_class inserts 4 rows of data, which can be understood as bootstrap items (49 data dictionary tables in PG15):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{ oid =&amp;gt; &amp;#39;1247&amp;#39;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname =&amp;gt; &amp;#39;pg_type&amp;#39;, reltype =&amp;gt; &amp;#39;pg_type&amp;#39; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{ oid =&amp;gt; &amp;#39;1249&amp;#39;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname =&amp;gt; &amp;#39;pg_attribute&amp;#39;, reltype =&amp;gt; &amp;#39;pg_attribute&amp;#39; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{ oid =&amp;gt; &amp;#39;1255&amp;#39;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname =&amp;gt; &amp;#39;pg_proc&amp;#39;, reltype =&amp;gt; &amp;#39;pg_proc&amp;#39; },
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{ oid =&amp;gt; &amp;#39;1259&amp;#39;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname =&amp;gt; &amp;#39;pg_class&amp;#39;, reltype =&amp;gt; &amp;#39;pg_class&amp;#39; },&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;postgres&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; oid::int &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1247&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; oid::int&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1259&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1247&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_type
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1249&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_attribute
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1255&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_proc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1259&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_class&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Once the base data dictionary is written, everything else can be generated from it.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Database Initialization Phase: initdb &amp;amp; postgres.bki
 &lt;div id="database-initialization-phase-initdb--postgresbki" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#database-initialization-phase-initdb--postgresbki" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Comment from &lt;code&gt;initdb.c&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; To create template1, we run the &lt;span style="color:#a6e22e"&gt;postgres&lt;/span&gt; (backend) program in bootstrap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; mode and feed it data from the postgres.bki library file. After this
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; initial bootstrap phase, some additional stuff is created by normal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; SQL commands fed to a standalone backend.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The backend is launched in bootstrap mode and runs the postgres.bki script. postgres.bki can execute relevant functions without any system tables. Only after this can normal SQL files and standard backend processes be used.&lt;/p&gt;
&lt;p&gt;template1 can be called the bootstrap database. The postgres and template0 databases are created only after template1 is established:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;initialize_data_directory&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Bootstrap template1 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;bootstrap_template1&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;make_template0&lt;/span&gt;(cmdfd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;make_postgres&lt;/span&gt;(cmdfd);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	PG_CMD_CLOSE;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;check_ok&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Once template1 exists, &lt;code&gt;make_template0&lt;/code&gt; and &lt;code&gt;make_postgres&lt;/code&gt; create the corresponding template0 and postgres databases, using the normal SQL &lt;code&gt;CREATE DATABASE&lt;/code&gt; command:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * copy template1 to postgres
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;make_postgres&lt;/span&gt;(FILE &lt;span style="color:#f92672"&gt;*&lt;/span&gt;cmdfd)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;line;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Just as we did for template0, and for the same reasons, assign a fixed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * OID to postgres and select the file_copy strategy.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;char&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; postgres_setup[] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#e6db74"&gt;&amp;#34;CREATE DATABASE postgres OID = &amp;#34;&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;CppAsString2&lt;/span&gt;(PostgresDbOid)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#e6db74"&gt;&amp;#34; STRATEGY = file_copy;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\n\n&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#e6db74"&gt;&amp;#34;COMMENT ON DATABASE postgres IS &amp;#39;default administrative connection database&amp;#39;;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\n\n&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		NULL
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	};
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (line &lt;span style="color:#f92672"&gt;=&lt;/span&gt; postgres_setup; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;line; line&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;PG_CMD_PUTS&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;line);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Backend Local Cache of Data Dictionary
 &lt;div id="backend-local-cache-of-data-dictionary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#backend-local-cache-of-data-dictionary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;For PG private memory basics, refer to &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135541103" target="_blank" rel="noreferrer"&gt;PostgreSQL Memory Analysis&lt;/a&gt;&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;PG&amp;rsquo;s data dictionary information is stored in the local backend process, not shared. The data dictionary cache mainly focuses on syscache/catcache and relcache, which cache system table and table schema information respectively.&lt;/p&gt;
&lt;p&gt;syscache/catcache is used to cache system tables, with syscache acting as the upper layer of catcache. syscache is an array where each element corresponds to a catcache, and each catcache corresponds to a system table&lt;sup id="fnref1:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;//PG15.3 SysCacheSize=35
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; CatCache &lt;span style="color:#f92672"&gt;*&lt;/span&gt;SysCache[SysCacheSize];&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When PG forks a backend, it calls &lt;code&gt;InitPostgres&lt;/code&gt;, which calls the initialization functions for syscache/catcache and relcache. Let&amp;rsquo;s look at backend initialization.&lt;/p&gt;

&lt;h3 class="relative group"&gt;syscache/catcache Initialization
 &lt;div id="syscachecatcache-initialization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#syscachecatcache-initialization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; cachedesc
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			reloid;			&lt;span style="color:#75715e"&gt;/* OID of the relation being cached */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	Oid			indoid;			&lt;span style="color:#75715e"&gt;/* OID of index relation for this cache */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			nkeys;			&lt;span style="color:#75715e"&gt;/* # of keys needed for cache lookup */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			key[&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;];			&lt;span style="color:#75715e"&gt;/* attribute numbers of key attrs */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			nbuckets;		&lt;span style="color:#75715e"&gt;/* number of hash buckets for this cache */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;struct&lt;/span&gt; cachedesc cacheinfo[] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; {
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{RelationRelationId,		&lt;span style="color:#75715e"&gt;/* RELNAMENSP */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ClassNameNspIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			Anum_pg_class_relname,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			Anum_pg_class_relnamespace,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{RelationRelationId,		&lt;span style="color:#75715e"&gt;/* RELOID */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		ClassOidIndexId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			Anum_pg_class_oid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		},
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#ae81ff"&gt;128&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;};&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For example, &lt;code&gt;Anum_pg_class_oid&lt;/code&gt; is defined in &lt;code&gt;pg_class_d.h&lt;/code&gt; generated by &lt;code&gt;genbki.pl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define Anum_pg_class_oid 1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;reloid is the OID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; oid::int &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1247&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; oid::int&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1259&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1259&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_class&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;InitCatalogCache&lt;/code&gt; actually initializes the syscache array, i.e., initializes all catcaches. &lt;code&gt;InitCatalogCache&lt;/code&gt; eventually fully initializes CatCache through &lt;code&gt;InitCatCache&lt;/code&gt; (one of which is for pg_class):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;InitCatalogCache&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (cacheId &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; cacheId &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; SysCacheSize; cacheId&lt;span style="color:#f92672"&gt;++&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		SysCache[cacheId] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;InitCatCache&lt;/span&gt;(cacheId,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 cacheinfo[cacheId].reloid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 cacheinfo[cacheId].indoid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 cacheinfo[cacheId].nkeys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 cacheinfo[cacheId].key,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;										 cacheinfo[cacheId].nbuckets);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;PointerIsValid&lt;/span&gt;(SysCache[cacheId]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;elog&lt;/span&gt;(ERROR, &lt;span style="color:#e6db74"&gt;&amp;#34;could not initialize cache %u (%d)&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 cacheinfo[cacheId].reloid, cacheId);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* Accumulate data for OID lists, too */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		SysCacheRelationOid[SysCacheRelationOidSize&lt;span style="color:#f92672"&gt;++&lt;/span&gt;] &lt;span style="color:#f92672"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			cacheinfo[cacheId].reloid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		SysCacheSupportingRelOid[SysCacheSupportingRelOidSize&lt;span style="color:#f92672"&gt;++&lt;/span&gt;] &lt;span style="color:#f92672"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			cacheinfo[cacheId].reloid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		SysCacheSupportingRelOid[SysCacheSupportingRelOidSize&lt;span style="color:#f92672"&gt;++&lt;/span&gt;] &lt;span style="color:#f92672"&gt;=&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			cacheinfo[cacheId].indoid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* see comments for RelationInvalidatesSnapshotsOnly */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;Assert&lt;/span&gt;(&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;RelationInvalidatesSnapshotsOnly&lt;/span&gt;(cacheinfo[cacheId].reloid));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	CacheInitialized &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then we come to &lt;code&gt;catcache.c&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;InitCatCache&lt;/code&gt; allocates memory and manages it in &lt;code&gt;CacheMemoryContext&lt;/code&gt;. It only assigns some macro-defined OIDs to the corresponding catcache — at this point, tables are not yet opened:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		InitCatCache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	This allocates and initializes a cache for a system catalog relation.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	Actually, the cache is only partially initialized to avoid opening the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	relation. The relation will be opened and the rest of the cache
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	structure initialized on the first access.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CatCache &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;InitCatCache&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; id,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 Oid reloid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 Oid indexoid,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nkeys,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#66d9ef"&gt;const&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;key,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 &lt;span style="color:#66d9ef"&gt;int&lt;/span&gt; nbuckets)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	oldcxt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(CacheMemoryContext);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	sz &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(CatCache) &lt;span style="color:#f92672"&gt;+&lt;/span&gt; PG_CACHE_LINE_SIZE;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (CatCache &lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#a6e22e"&gt;CACHELINEALIGN&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;palloc0&lt;/span&gt;(sz));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_bucket &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;palloc0&lt;/span&gt;(nbuckets &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;sizeof&lt;/span&gt;(dlist_head));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * initialize the cache&amp;#39;s relation information for the relation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * corresponding to this cache, and initialize some of the new cache&amp;#39;s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * other internal fields. But don&amp;#39;t open the relation yet.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_relname &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;(not known yet)&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_reloid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; reloid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_indexoid &lt;span style="color:#f92672"&gt;=&lt;/span&gt; indexoid;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_relisshared &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false; &lt;span style="color:#75715e"&gt;/* temporary */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_tupdesc &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (TupleDesc) NULL;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_ntup &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_nbuckets &lt;span style="color:#f92672"&gt;=&lt;/span&gt; nbuckets;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_nkeys &lt;span style="color:#f92672"&gt;=&lt;/span&gt; nkeys;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; (i &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;; i &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; nkeys; &lt;span style="color:#f92672"&gt;++&lt;/span&gt;i)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		cp&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cc_keyno[i] &lt;span style="color:#f92672"&gt;=&lt;/span&gt; key[i];
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(oldcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; cp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;id&lt;/code&gt; is the index of the catcache array element. The assigned &lt;code&gt;reloid&lt;/code&gt; is the known OID from cacheinfo, and &lt;code&gt;key[4]&lt;/code&gt; from cacheinfo is also assigned. Other information is mostly unknown yet — for example, relname, tupdesc — because system tables haven&amp;rsquo;t been opened yet.&lt;/p&gt;
&lt;p&gt;catcache only opens tables during search operations. Although the function name contains &lt;code&gt;*init*&lt;/code&gt;, it&amp;rsquo;s no longer in the initialization process — the relevant functions won&amp;rsquo;t be shown here.&lt;/p&gt;
&lt;p&gt;After syscache/catcache initialization completes, there is actually no tuple information at all.&lt;/p&gt;

&lt;h3 class="relative group"&gt;relcache Initialization
 &lt;div id="relcache-initialization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#relcache-initialization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The relcache initialization is well explained in &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135541103" target="_blank" rel="noreferrer"&gt;PostgreSQL Memory Analysis&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;relcache initialization has 5 phases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RelationCacheInitialize - initializes relcache, initially empty&lt;/li&gt;
&lt;li&gt;RelationCacheInitializePhase2 - initializes shared catalogs and loads 5 global system tables&lt;/li&gt;
&lt;li&gt;RelationCacheInitializePhase3 - completes relcache initialization and loads 4 basic system tables&lt;/li&gt;
&lt;li&gt;RelationIdGetRelation - gets rel description by relation id&lt;/li&gt;
&lt;li&gt;RelationClose - closes a relation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both &lt;code&gt;RelationCacheInitializePhase2&lt;/code&gt; and &lt;code&gt;RelationCacheInitializePhase3&lt;/code&gt; load system tables, and they must be in order.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;RelationCacheInitializePhase2&lt;/code&gt; loads several system tables — interested readers can check the function themselves. &lt;code&gt;RelationCacheInitializePhase3&lt;/code&gt; is the one relevant to our question, let&amp;rsquo;s look at that:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		RelationCacheInitializePhase3
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		This is called as soon as the catcache and transaction system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		are functional and we have determined MyDatabaseId. At this point
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		we can actually read data from the database&amp;#39;s system catalogs.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		We first try to read pre-computed relcache entries from the local
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		relcache init file. If that&amp;#39;s missing or broken, make phony entries
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		for the minimum set of nailed-in-cache relations. Then (unless
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		bootstrapping) make sure we have entries for the critical system
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		indexes. Once we&amp;#39;ve done all this, we have enough infrastructure to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		open any system catalog or use any catcache. The last step is to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *		rewrite the cache files if needed.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;RelationCacheInitializePhase3&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;() &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#f92672"&gt;!&lt;/span&gt;&lt;span style="color:#a6e22e"&gt;load_relcache_init_file&lt;/span&gt;(false))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		needNewCacheFile &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_class&amp;#34;&lt;/span&gt;, RelationRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_class, Desc_pg_class);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_attribute&amp;#34;&lt;/span&gt;, AttributeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_attribute, Desc_pg_attribute);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_proc&amp;#34;&lt;/span&gt;, ProcedureRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_proc, Desc_pg_proc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;formrdesc&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;pg_type&amp;#34;&lt;/span&gt;, TypeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_type, Desc_pg_type);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define NUM_CRITICAL_LOCAL_RELS 4	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* fix if you change list above */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;MemoryContextSwitchTo&lt;/span&gt;(oldcxt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* In bootstrap mode, the faked-up formrdesc info is all we&amp;#39;ll have */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IsBootstrapProcessingMode&lt;/span&gt;())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/* now write the files */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(true);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;write_relcache_init_file&lt;/span&gt;(false);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;IsBootstrapProcessingMode&lt;/code&gt; is specifically designed for bootstrap mode — normal backends don&amp;rsquo;t satisfy this condition.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;load_relcache_init_file(false)&lt;/code&gt; attempts to load system table information from the init file. &lt;code&gt;load_relcache_init_file(false)&lt;/code&gt; passes &lt;code&gt;false&lt;/code&gt; meaning it&amp;rsquo;s a private init file, not a shared one:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pwd
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pgdata&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzl&lt;span style="color:#f92672"&gt;/&lt;/span&gt;data15_6879&lt;span style="color:#f92672"&gt;/&lt;/span&gt;base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Rough view. strings ignores some info, but table and column names are visible
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; strings pg_internal.init &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep pg_class
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_class_oid_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_class
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_class_relname_nsp_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; strings pg_internal.init &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#f92672"&gt;-&lt;/span&gt;E &lt;span style="color:#e6db74"&gt;&amp;#34;pg_class|relname&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_class_oid_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_class
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relnamespace
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_class_relname_nsp_index
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;relnamespace&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If the init file is damaged or doesn&amp;rsquo;t exist, loading the init file fails and enters the branch to load 4 basic system tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	// Similar to phase 2, load more system table descriptions
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	if (IsBootstrapProcessingMode() ||
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		!load_relcache_init_file(false))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		needNewCacheFile = true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		formrdesc(&amp;#34;pg_class&amp;#34;, RelationRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_class, Desc_pg_class);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		formrdesc(&amp;#34;pg_attribute&amp;#34;, AttributeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_attribute, Desc_pg_attribute);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		formrdesc(&amp;#34;pg_proc&amp;#34;, ProcedureRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_proc, Desc_pg_proc);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		formrdesc(&amp;#34;pg_type&amp;#34;, TypeRelation_Rowtype_Id, false,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 Natts_pg_type, Desc_pg_type);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With the 4 basic tables including pg_class, loading subsequent system table information becomes straightforward.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;《PostgreSQL Kernel Analysis》 Chapters 2, 3&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/system-catalog-declarations.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/system-catalog-declarations.html&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/135541103" target="_blank" rel="noreferrer"&gt;PostgreSQL Memory Analysis&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title>How to Solve Index Split Contention?</title><link>https://lastdba.com/en/2024/08/12/how-to-solve-index-split-contention/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/how-to-solve-index-split-contention/</guid><description>&lt;h2 class="relative group"&gt;Index Splitting
 &lt;div id="index-splitting" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-splitting" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When an index block is nearly full, index splitting occurs. Index splitting comes in two forms: 55 and 91:



&lt;img src="https://lastdba.com/img/csdn/cd43a5c7b484.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/be40fcc99a6d.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The difference between index splitting and the enq: TX - index contention wait event&lt;/strong&gt;
Whether 55 or 91 splitting, both are normal index behavior as data volume increases. Index splitting is a &lt;strong&gt;normal phenomenon&lt;/strong&gt; caused by growing data volume leading to larger indexes — when an index can&amp;rsquo;t hold more data, it naturally needs more index blocks. There are hardly any scenarios with tables but no indexes (only during initial data loading would one consider inserting data first and building indexes afterward). Although index splitting consumes some resources, in today&amp;rsquo;s Oracle environments it can complete quickly. Only when there are too many indexes does it affect insert efficiency.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Index Splitting
 &lt;div id="index-splitting" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-splitting" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When an index block is nearly full, index splitting occurs. Index splitting comes in two forms: 55 and 91:



&lt;img src="https://lastdba.com/img/csdn/cd43a5c7b484.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/be40fcc99a6d.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The difference between index splitting and the enq: TX - index contention wait event&lt;/strong&gt;
Whether 55 or 91 splitting, both are normal index behavior as data volume increases. Index splitting is a &lt;strong&gt;normal phenomenon&lt;/strong&gt; caused by growing data volume leading to larger indexes — when an index can&amp;rsquo;t hold more data, it naturally needs more index blocks. There are hardly any scenarios with tables but no indexes (only during initial data loading would one consider inserting data first and building indexes afterward). Although index splitting consumes some resources, in today&amp;rsquo;s Oracle environments it can complete quickly. Only when there are too many indexes does it affect insert efficiency.&lt;/p&gt;
&lt;p&gt;However, the enq: TX - index contention wait is NOT normal. enq: TX - index contention indicates that SQL statements are waiting on an index block that is currently being split. Essentially, DML concurrency is too high and all sessions are waiting on the splitting index block.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why does enq: TX - index contention always occur on sequentially inserted columns?&lt;/strong&gt;
Although both 55 and 55 splits are possible in real scenarios, enq: TX - index contention frequently occurs with 91 splits. This is because columns like sequences and timestamps usually have indexes, and sequential inserts are common. The rightmost block is always the hot block, and subsequent inserts must wait for the split block to complete before they can proceed — this causes enq: TX - index contention. Why don&amp;rsquo;t UUID indexes cause enq: TX - index contention? Because UUID indexes are unordered — inserting causes UUID index splits, but it&amp;rsquo;s unlikely that subsequent UUID values also land on that same splitting index block. So UUID has index splitting but doesn&amp;rsquo;t form an enq wait queue leading to enq: TX - index contention.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Solutions
 &lt;div id="solutions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solutions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Note: what we need to solve is the index split wait enq: TX - index contention, not index splitting itself. Solutions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Reverse Index&lt;/strong&gt;
A reverse index stores key values in the opposite order. For example, for the value &amp;lsquo;1111 0001&amp;rsquo;, a normal index places it after &amp;lsquo;0000 0002&amp;rsquo;; with a reverse index, it&amp;rsquo;s placed before &amp;lsquo;0000 0002&amp;rsquo;. Think about a timestamp column — normally it&amp;rsquo;s a rightmost hot spot. After reversing, seconds, minutes, and hours sort first. One index block might contain data from different months but the same second. This way, the rightmost hot block essentially disappears — reverse indexes scatter hot spots across various index blocks.
&lt;em&gt;Limitations&lt;/em&gt;: Requires index modification; may lose index range scan capability. Sequentially growing columns cannot use index range scans (e.g., timestamp columns). In some scenarios, reverse key values might still work — requires specific analysis.
&lt;em&gt;Syntax&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; reveridx &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; tablzl (name) REVERSE;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. Hash-Partitioned Index&lt;/strong&gt;
Creating a hash-partitioned index on a regular table is equivalent to keeping the table unchanged but partitioning the index, thus scattering the rightmost hot block across partitions. For example, an 8-partition hash-partitioned index divides the index into 8 segments, creating 8 rightmost hot spots and alleviating the index split problem.
&lt;em&gt;Limitations&lt;/em&gt;: Requires index modification; affects index range query performance — requires balancing insert hot spot mitigation vs. query efficiency.
Equality and IN queries can efficiently use hash-partitioned indexes. From the official documentation:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Queries involving equality and &lt;code&gt;IN&lt;/code&gt; predicates on index partitioning key can efficiently use global hash partitioned index to answer queries quickly&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;However, range scan efficiency decreases — the more partitions, the greater the decrease (though more partitions also provide better hot spot relief). This is clearly a balancing act. Tests show that with 8 partitions, logical reads for range scans increase nearly 8x. After partitioning, indexes within each partition remain ordered, and clustering factor differences are minor — the cost of scanning the index is similar, but the cost of table access increases. If a regular index has 8 entries in one block pointing to 1 data block (1 logical read), after hash partitioning across 8 partitions (1 index block each), it becomes 8 logical reads. This is why range scan index performance degrades.
&lt;em&gt;Syntax&lt;/em&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; cust_last_name_ix &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; customers (cust_last_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;GLOBAL&lt;/span&gt; PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; HASH (cust_last_name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PARTITIONS &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Using Table Partitioning to Scatter Indexes&lt;/strong&gt;
Partition the table and create local indexes to scatter the rightmost hot spots.
&lt;em&gt;Limitations&lt;/em&gt;: The partition key cannot be the index column (otherwise it defeats the purpose); requires table modification; if existing SQL already has partition key predicates, range scan efficiency is not affected.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Reduce Concurrency&lt;/strong&gt;
Reducing concurrency is the ultimate weapon. Index split contention is fundamentally caused by excessively high concurrency — generally, without dozens of concurrent inserts, index split contention won&amp;rsquo;t occur.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Modify Index Block Size&lt;/strong&gt;
Place index blocks in 16K or 32K tablespaces. In theory, this should help because indexes can hold more data and splitting occurs less frequently. However, performance testing is needed, and other parameters may need adjustment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. Remove the Index&lt;/strong&gt;
Removing the index is also an option. Based on business requirements, if the index is not important, drop it. Or use range queries with partitioned tables, leveraging partition pruning instead of indexes.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why These Approaches Don&amp;rsquo;t Work???
 &lt;div id="why-these-approaches-dont-work" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-these-approaches-dont-work" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Increasing ITL transaction slots&lt;/strong&gt;: Index block transaction slots may also be insufficient under high concurrency — this is indeed similar to index splitting, but the wait event is enq: TX - allocate ITL entry. If this wait is observed and traced to index blocks, it indicates high concurrency on the index. Reverse indexes and hash-partitioned indexes can also help, and adjusting initrans may solve the problem. However, the root causes of these two wait events differ — index splitting doesn&amp;rsquo;t always come with transaction slot issues.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Adjusting index block PCTFREE&lt;/strong&gt;: PCTFREE indicates that when a block&amp;rsquo;s free space falls below PCTFREE, it is no longer recorded in FREELIST and cannot accept new inserts. Consider two cases: increasing and decreasing PCTFREE. Increasing PCTFREE only worsens index splitting. Decreasing PCTFREE seems effective — similar to adjusting block size in principle — but in real scenarios PCTFREE defaults to 10%, which is already hard to reduce further, so the effect is negligible.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Rebuilding indexes to reduce fragmentation&lt;/strong&gt;: This is essentially unrelated — it doesn&amp;rsquo;t solve the rightmost hot block problem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/lihuarongaini/article/details/101299328" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/lihuarongaini/article/details/101299328&lt;/a&gt;
&lt;a href="https://docs.oracle.com/cd/E11882_01/server.112/e41573/data_acc.htm#PFGRF94786" target="_blank" rel="noreferrer"&gt;https://docs.oracle.com/cd/E11882_01/server.112/e41573/data_acc.htm#PFGRF94786&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Acknowledgments: 豪桑, 用哥&lt;/p&gt;</content:encoded></item><item><title>Incorrect Execution Plan Caused by Partition Permission Issues</title><link>https://lastdba.com/en/2024/08/12/incorrect-execution-plan-caused-by-partition-permission-issues/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/incorrect-execution-plan-caused-by-partition-permission-issues/</guid><description>&lt;h2 class="relative group"&gt;Problem Overview
 &lt;div id="problem-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Last night, the business team updated a SQL query. Previously, the query ran very fast without the &lt;code&gt;DATE_CREATED&lt;/code&gt; field (the partition key). After the release, the partition field was added to reduce the number of partitions accessed. However, after adding it, the UPDATE execution actually became slower.&lt;/p&gt;
&lt;p&gt;Before:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before the release, access time was in milliseconds. After the release, access time was 10 seconds. The SQL runs frequently, and the business found this unacceptable.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Overview
 &lt;div id="problem-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Last night, the business team updated a SQL query. Previously, the query ran very fast without the &lt;code&gt;DATE_CREATED&lt;/code&gt; field (the partition key). After the release, the partition field was added to reduce the number of partitions accessed. However, after adding it, the UPDATE execution actually became slower.&lt;/p&gt;
&lt;p&gt;Before:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Before the release, access time was in milliseconds. After the release, access time was 10 seconds. The SQL runs frequently, and the business found this unacceptable.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;The Execution Plan Appeared Correct
 &lt;div id="the-execution-plan-appeared-correct" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-execution-plan-appeared-correct" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Table structure:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.TABLE_RECORD&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------+-----------------------------+-----------+----------+---------------------------------------------------+----------+--------------+--------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id_TABLE_RECORD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;seq_TABLE_RECORD&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; appl_no &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; r_appl_no &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; created_by &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sys&amp;#39;&lt;/span&gt;::character varying &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; updated_by &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sys&amp;#39;&lt;/span&gt;::character varying &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_updated &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;date_TABLE_RECORD&amp;#34;&lt;/span&gt; btree (date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_dateupdated&amp;#34;&lt;/span&gt; btree (date_updated)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_applnodeleted&amp;#34;&lt;/span&gt; btree (appl_no, is_deleted)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;nk_TABLE_RECORD&amp;#34;&lt;/span&gt; btree (appl_no)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: TABLE_RECORD_202211 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2022-11-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2022-12-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202303 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202304 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202305 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202306 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_202512 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2025-12-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2026-01-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; TABLE_RECORD_other &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This SQL would access partitions from the last 2 months, both of which contained data. The above UPDATE would only update one row.&lt;/p&gt;
&lt;p&gt;At first, analyzing the problem was very confusing because when we ran EXPLAIN, the execution plan looked fine.&lt;/p&gt;
&lt;p&gt;EXPLAIN partition scan info:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202302_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202302 TABLE_RECORD_4 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202303_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202303 TABLE_RECORD_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;482&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202304_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202304 TABLE_RECORD_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;481&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_25 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 TABLE_RECORD_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;483&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_14 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 TABLE_RECORD_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_38 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202307 TABLE_RECORD_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202308 TABLE_RECORD_10 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Partition data distribution:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;),tableoid::regclass &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; TABLE_RECORD &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tableoid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;56558&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4436&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202211
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6929&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;945&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202305
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1413&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5499&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202212
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1486&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4722&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202302&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The execution plan appeared to access different indexes for different partitions:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;date_TABLE_RECORD&lt;/code&gt;: index on the partition key&lt;/li&gt;
&lt;li&gt;&lt;code&gt;idx_applnodeleted&lt;/code&gt;: composite index on &lt;code&gt;appl_no, is_deleted&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In reality, the SQL could prune partitions using the &lt;code&gt;DATE_CREATED&lt;/code&gt; (last 31 days) field. But if it used the index on that field, there would be no selectivity at all. The composite index &lt;code&gt;idx_applnodeleted&lt;/code&gt; on &lt;code&gt;appl_no, is_deleted&lt;/code&gt; had much better selectivity within partitions, so the correct execution plan should choose the &lt;code&gt;idx_applnodeleted&lt;/code&gt; composite index.&lt;/p&gt;
&lt;p&gt;The EXPLAIN plan above is not the actual execution plan, but we can see that the May and June partitions did use the correct index — the &lt;code&gt;appl_no, is_deleted&lt;/code&gt; composite index.&lt;/p&gt;
&lt;p&gt;To view the actual execution plan, we need to execute the SQL. So we changed the UPDATE to a SELECT:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;,buffers,timing,&lt;span style="color:#66d9ef"&gt;verbose&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; TABLE_RECORD &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now() ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;266&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;266&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;565&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;566&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;265&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;388&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;558&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Subplans Removed: &lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_25 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.TABLE_RECORD_202305 TABLE_RECORD_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;059&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;059&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((TABLE_RECORD_1.appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((TABLE_RECORD_1.is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((TABLE_RECORD_1.date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (TABLE_RECORD_1.date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_14 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.TABLE_RECORD_202306 TABLE_RECORD_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;328&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;498&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((TABLE_RECORD_2.appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((TABLE_RECORD_2.is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((TABLE_RECORD_2.date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (TABLE_RECORD_2.date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5867&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;195&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;654&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SELECT only accessed the May and June partitions, indicating partition pruning worked correctly. Both partitions used the &lt;code&gt;idx_applnodeleted&lt;/code&gt; index, so index selection was also correct.&lt;/p&gt;
&lt;p&gt;Direct execution of the SELECT statement returned results in milliseconds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; TABLE_RECORD &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now() ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;946&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point in the analysis, the execution plan appeared normal and execution time appeared normal.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Business SQL Was Still Slow
 &lt;div id="the-business-sql-was-still-slow" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-business-sql-was-still-slow" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;However, slow SQL still appeared in the PostgreSQL logs — the UPDATE took 10 seconds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;077&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldbopr&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;116286&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.78.90:51871&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;649&lt;/span&gt;cdebf.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;c63e,&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;UPDATE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;09&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;759&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12440291&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4002354803&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 10287.105 ms &amp;#34;&lt;/span&gt; plan:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Query Text: &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;, DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LOCALTIMESTAMP&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;203&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;79&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2960&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202211 TABLE_RECORD_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202304_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202304 TABLE_RECORD_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;481&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202305_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 TABLE_RECORD_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;483&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202306_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 TABLE_RECORD_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_38 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202307 TABLE_RECORD_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The May and June partitions were still using the &lt;code&gt;date_created&lt;/code&gt; index on the partition key. The execution plan estimated only 1 row, but in reality these two partitions each had millions of rows.&lt;/p&gt;
&lt;p&gt;This was very confusing — the optimizer itself could choose a better index, and EXPLAIN showed it going to that index, but the business SQL simply wasn&amp;rsquo;t using the correct index.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Updating Statistics
 &lt;div id="updating-statistics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#updating-statistics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since this was a PostgreSQL execution plan issue, the first thought was to collect statistics.&lt;/p&gt;
&lt;p&gt;After the problem occurred, we collected statistics for both the parent partitioned table and child partitions. Concerned that sessions might have cached the execution plan (&lt;code&gt;plan_cache_mode=auto&lt;/code&gt;), we killed all sessions that connected before the statistics collection.&lt;/p&gt;
&lt;p&gt;The logs still showed the SQL taking 10 seconds, indicating it wasn&amp;rsquo;t a statistics issue.&lt;/p&gt;
&lt;p&gt;At this point the problem remained unsolved. We seemed to have exhausted all options.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Root Cause
 &lt;div id="root-cause" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#root-cause" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Earlier, when analyzing execution plans, the DBA&amp;rsquo;s EXPLAIN output differed from the application&amp;rsquo;s execution plan. However, we had been executing everything as the PostgreSQL superuser. We switched to the application user and ran EXPLAIN again — the execution plan matched what was in the logs!&lt;/p&gt;
&lt;p&gt;Since we had previously encountered issues with native partitioned table permissions causing abnormal execution plans, we immediately checked partition permissions.&lt;/p&gt;
&lt;p&gt;Parent table permissions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; TABLE_RECORD
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+--------------------------+-------------------+-------------------------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldbdata&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwdDxt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_qry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_dml&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwd&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Child partition permissions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; TABLE_RECORD_202505
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------------------------------+-------+------------------------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; TABLE_RECORD_202505 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldbdata&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwdDxt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The partition permissions were missing the &lt;code&gt;r_lzldbdata_dml&lt;/code&gt; role, which is granted to the business user.&lt;/p&gt;
&lt;p&gt;We immediately granted the permissions, and the problem was resolved:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldbdata_dml;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; r_lzldbdata_dml;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After switching to the &lt;code&gt;opr&lt;/code&gt; user again and running EXPLAIN, the execution plan was correct — the May and June partitions used the proper index:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;\c - lzldbopr&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202303_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202303 TABLE_RECORD_5 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;482&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; TABLE_RECORD_202304_date_created_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202304 TABLE_RECORD_6 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;481&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_25 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202305 TABLE_RECORD_7 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;43&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;483&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_14 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202306 TABLE_RECORD_8 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;56&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;57&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;485&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_38 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202307 TABLE_RECORD_9 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_applnodeleted_1 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; TABLE_RECORD_202308 TABLE_RECORD_10 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3502&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LZLMATH20230132302302&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;No more slow UPDATE statements were observed in the PostgreSQL logs.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Testing (Not Reproduced)
 &lt;div id="testing-not-reproduced" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#testing-not-reproduced" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Initial table creation script:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Switch to non-superuser
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; lzldbdata
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- create table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; APPL_NO varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	IS_DELETED varchar(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DATE_CREATED &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; now(),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DATE_UPDATED &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE(DATE_CREATED);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; DATE_LZLPARTITION &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION (DATE_CREATED);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; NK_LZLPARTITION &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION (APPL_NO);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- privs
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.LZLPARTITION &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; r_lzldbdata_qry;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.LZLPARTITION &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; r_lzldbdata_dml;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202301 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202302 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202303 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202304 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202305 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION_202306 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generate data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.LZLPARTITION
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; n &lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; to_char(to_date(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;YYYY-MM-DD&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;+&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt; &lt;span style="color:#f92672"&gt;||&lt;/span&gt; n &lt;span style="color:#f92672"&gt;||&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39; minute&amp;#39;&lt;/span&gt;) ::interval, &lt;span style="color:#e6db74"&gt;&amp;#39;YYYY-MM-DD&amp;#39;&lt;/span&gt;)::&lt;span style="color:#e6db74"&gt;&amp;#34;date&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;300000&lt;/span&gt;) n&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Data distribution:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;),tableoid::regclass &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tableoid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;44640&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;40320&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202302
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;44640&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;43200&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;44640&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202305
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;43200&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;39361&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202307&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Permissions not inherited:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+--------------+-------------------+-------------------------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldbdata&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwdDxt&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_qry&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;+|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; r_lzldbdata_dml&lt;span style="color:#f92672"&gt;=&lt;/span&gt;arwd&lt;span style="color:#f92672"&gt;/&lt;/span&gt;lzldbdata &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition_202306
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------------------+-------+-------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition_202306 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan (correct):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; APPL_NO &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;217450&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; IS_DELETED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;N&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; DATE_CREATED &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;77&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Subplans Removed: &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzlpartition_202305_appl_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition_202305 lzlpartition_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;217450&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzlpartition_202306_appl_no_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition_202306 lzlpartition_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((appl_no)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;217450&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((is_deleted)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; now()) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;31 days&amp;#39;&lt;/span&gt;::interval &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;)))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The permissions were still not inherited. In fact, we tested on other PostgreSQL versions and observed the same behavior — it seems to be a general behavior.&lt;/p&gt;
&lt;p&gt;However, even so, we couldn&amp;rsquo;t reproduce the issue. The test results used the correct index, unlike the production environment which used the wrong index.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since we had collected statistics and killed sessions, it shouldn&amp;rsquo;t have been a cached execution plan issue. After executing GRANT, the partition execution plan immediately became correct (even granting just one partition fixed that specific partition), so we are fairly confident that the partition permission issue caused the abnormal partition execution plan.&lt;/p&gt;
&lt;p&gt;The analysis and resolution process can be summarized as follows:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Switch to the application user to view the execution plan. Using the superuser to view execution plans is a common practice, but the plan seen from the superuser may not be correct.&lt;/li&gt;
&lt;li&gt;Permissions on child partitions of partitioned tables. The root cause is that permissions on child partitions of PostgreSQL partitioned tables were inconsistent with the parent table, causing the execution plan to be abnormal. In other words, permission issues affected PostgreSQL&amp;rsquo;s execution plan.&lt;/li&gt;
&lt;li&gt;This issue is difficult to reproduce and occurs very, very rarely.&lt;/li&gt;
&lt;li&gt;Permission-caused execution plan anomalies are extremely subtle and hard to diagnose.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Two questions worth deeper discussion:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Permission issues shouldn&amp;rsquo;t affect execution plans. Why do permissions affect execution plans?&lt;/li&gt;
&lt;li&gt;Child partition permissions are inconsistent with parent table permissions. Why don&amp;rsquo;t child partitions fully inherit parent table permissions?&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;A bug report has been submitted to see what the official team says.&lt;/p&gt;</content:encoded></item><item><title>My 2023 Year-End Summary</title><link>https://lastdba.com/en/2024/08/12/my-2023-year-end-summary/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/my-2023-year-end-summary/</guid><description>&lt;h2 class="relative group"&gt;As a DBA
 &lt;div id="as-a-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#as-a-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since early 2023, I set my main task for the year — &lt;strong&gt;learn the PostgreSQL database&lt;/strong&gt;. Though I didn&amp;rsquo;t set detailed plans, the overall goal was to finish learning some foundational PostgreSQL knowledge. Later I found I had oversimplified things — the cost of learning PostgreSQL was far greater than I imagined, and I didn&amp;rsquo;t achieve this goal in 2023. For example, the PostgreSQL transaction chapter: I thought I could finish it in 2 weeks, but it took me about 2 months. Regardless, persistent learning did yield some results:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;As a DBA
 &lt;div id="as-a-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#as-a-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since early 2023, I set my main task for the year — &lt;strong&gt;learn the PostgreSQL database&lt;/strong&gt;. Though I didn&amp;rsquo;t set detailed plans, the overall goal was to finish learning some foundational PostgreSQL knowledge. Later I found I had oversimplified things — the cost of learning PostgreSQL was far greater than I imagined, and I didn&amp;rsquo;t achieve this goal in 2023. For example, the PostgreSQL transaction chapter: I thought I could finish it in 2 weeks, but it took me about 2 months. Regardless, persistent learning did yield some results:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1783a492fc8a.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;Among them, the optimizer chapter was actually not completed. Though I&amp;rsquo;m guilty, I still need to explain. The optimization chapter has been in progress for over two months — not because I was slacking off, but because it&amp;rsquo;s simply impossible to finish. It has already reached Typora&amp;rsquo;s text limit — around 8000 characters it starts lagging, so I had to passively split it into parts. It&amp;rsquo;s already split to Part 4:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c2313549e7fe.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;Even so, the optimization chapter is probably less than half done. I can only shamelessly carry it over to the next year&amp;hellip; Personally, I think another 4 months should let me complete the optimization chapter&amp;hellip; Even then, the priority needs to be pushed back — there&amp;rsquo;s really not enough time!&lt;/p&gt;

&lt;h2 class="relative group"&gt;READING
 &lt;div id="reading" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#reading" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;My main profession is databases, so I should spend time on databases, and extracurricular reading should take a back seat. However, I still don&amp;rsquo;t want to give up this part, for three reasons I think:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The value brought by reading is immeasurable in the short term&lt;/li&gt;
&lt;li&gt;Reading brings a pleasant sense of intellectual enrichment&lt;/li&gt;
&lt;li&gt;I use fragmented time to read, only spending 2-3 hours writing reading notes, which doesn&amp;rsquo;t take up too much study time&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;ve certainly read some PostgreSQL technical books, but I read them with a targeted approach. For example, for optimization, I&amp;rsquo;d bring together &amp;ldquo;The Internals of PostgreSQL,&amp;rdquo; &amp;ldquo;PostgreSQL Technical Internals: Query Optimization Deep Dive,&amp;rdquo; &amp;ldquo;PostgreSQL Query Engine Source Code Technical Analysis,&amp;rdquo; and &amp;ldquo;The Art of Database Query Optimizer&amp;rdquo; to study a particular knowledge point together. I wasn&amp;rsquo;t focused on whether I&amp;rsquo;d finish them, and I didn&amp;rsquo;t read them cover-to-cover in order. So the reading list here only covers extracurricular books.&lt;/p&gt;
&lt;p&gt;2023 Extracurricular Reading List (ranked by preference):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&amp;ldquo;Homo Deus&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Romance of the Three Kingdoms&amp;rdquo;&lt;/li&gt;
&lt;li&gt;The &amp;ldquo;Space Odyssey&amp;rdquo; series: 2001, 2010, 2060, 3001&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Elon Musk&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Chimpanzee Politics&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Goodbye, the Age of Mediocrity&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Wild&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Are We Smart Enough to Know How Smart Animals Are?&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;To Kill a Mockingbird&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Rich Dad Poor Dad&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;When Breath Becomes Air&amp;rdquo;&lt;/li&gt;
&lt;li&gt;&amp;ldquo;The Metamorphosis,&amp;rdquo; &amp;ldquo;The Judgment,&amp;rdquo; &amp;ldquo;A Hunger Artist&amp;rdquo; and other Kafka short stories&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Not great: &amp;ldquo;What Life Could Mean to You,&amp;rdquo; &amp;ldquo;How to Win Friends and Influence People,&amp;rdquo; &amp;ldquo;The Courage to Be Disliked&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Blog and WeChat Official Account
 &lt;div id="blog-and-wechat-official-account" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#blog-and-wechat-official-account" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I publish articles through two channels:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CSDN Blog: &lt;a href="https://liuzhilong.blog.csdn.net" target="_blank" rel="noreferrer"&gt;https://liuzhilong.blog.csdn.net&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;WeChat Official Account: liuzhilong62&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&amp;rsquo;ve been persisting with blogging for many years. The big change in 2023 was mainly writing about PostgreSQL and increasing technical depth. The WeChat Official Account is a new venture I started this year, and it was a major experiment in 2023. Both blogs and official accounts can be used for technical sharing, but their audiences are somewhat different. A blog can serve as a technical accumulation, while an official account is more like a technical news feed. There are many big names in the community who publish daily (even multiple times a day) — I greatly admire that. But there are also big names who focus on quality articles without worrying about daily posting. I personally prefer the latter approach — learning a domain&amp;rsquo;s knowledge roughly in one go, which feels more holistic and targeted. Often I split longer articles into parts for the official account (I don&amp;rsquo;t even like reading overly long articles myself). On my blog I don&amp;rsquo;t split them, so readers interested in a particular article can search for it on CSDN — it&amp;rsquo;s easier to read there.&lt;/p&gt;
&lt;p&gt;Why write?&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Self-learning value&lt;/li&gt;
&lt;li&gt;Technical research value&lt;/li&gt;
&lt;li&gt;Dissemination value&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The efficiency of active learning far exceeds passive learning, just like this learning pyramid (image from &amp;ldquo;Rich Dad Poor Dad&amp;rdquo; — the value of extracurricular reading!):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a5205d91518d.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;Opportunities like hands-on practice and presentations are rare and hard to come by. Outputting what you&amp;rsquo;ve learned as articles greatly improves your understanding of knowledge points. Reading an article might take just ten minutes, but producing it as an article may take more than ten times that long.&lt;/p&gt;
&lt;p&gt;This year I also tried doing pure translation-style technical articles. Although the technical research value isn&amp;rsquo;t high, there&amp;rsquo;s still learning value and dissemination value. Reading something once versus translating it once leads to different levels of understanding, just like what I said above: active learning. However, what bothers me a bit now is: previously, for things I couldn&amp;rsquo;t understand, I&amp;rsquo;d use Google Translate for a rough pass and then polish it myself. Now with GPT, it can translate an entire article and I barely need to change any words or sentences. The active learning value has been severely diluted — the AI is doing all the learning&amp;hellip;&lt;/p&gt;
&lt;p&gt;My writing style changed significantly in 2023. I wrote about various things and tried everything. Of course, I know one should focus on vertical content, but I still couldn&amp;rsquo;t resist doing random things — I haven&amp;rsquo;t even settled on a name for my official account yet. Currently, what&amp;rsquo;s clear is: technical articles and extracurricular reading notes, with technical articles as the main focus. Other types of articles probably won&amp;rsquo;t be written anymore. Whether I&amp;rsquo;ll adjust later, I don&amp;rsquo;t know. At least the official account still has room for adjustment. Anyway, let&amp;rsquo;s keep it like this — launch first, adjust later.&lt;/p&gt;
&lt;p&gt;2023 blog statistics are hard to track now. I can only provide blog data from 2017 to 2023 as a snapshot.&lt;/p&gt;
&lt;p&gt;CSDN Blog:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d6b07e98d21d.png" alt="Image" /&gt;&lt;/p&gt;
&lt;p&gt;WeChat Official Account followers:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/dd005d674681.png" alt="Image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Final Thoughts
 &lt;div id="final-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#final-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The biggest realization of 2023 — time. There&amp;rsquo;s really not enough time!&lt;/p&gt;
&lt;p&gt;On June 17, 2023, I participated in the PostgreSQL Database Technology Summit Chengdu stop and shared my fresh, hot-off-the-press PostgreSQL transaction knowledge with the experts. It was my first time on stage and I was quite nervous. I must thank Boss Can for the opportunity. There was a small episode during this sharing that shows how pressed for time I was in 2023. I also had part-time graduate studies — the day of the sharing was also my final exam day. After finishing my talk, I rushed straight to the airport&amp;hellip; In the end, I missed 3 exams and had to retake them&amp;hellip; It was too hard.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;ve completely given up on work-life balance — having a work-learning balance would be good enough. Every day after work I don&amp;rsquo;t think about resting but about going home to study. In the end, there were still many things unfinished, left to my 2024 self.&lt;/p&gt;
&lt;p&gt;Expectations for 2024:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Complete my thesis and graduate smoothly&lt;/li&gt;
&lt;li&gt;Finish the PostgreSQL optimization section&lt;/li&gt;
&lt;li&gt;We&amp;rsquo;ll see about the rest&lt;/li&gt;
&lt;/ol&gt;</content:encoded></item><item><title>ORDER BY LIMIT 10 Slower Than ORDER BY LIMIT 100</title><link>https://lastdba.com/en/2024/08/12/order-by-limit-10-slower-than-order-by-limit-100/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/order-by-limit-10-slower-than-order-by-limit-100/</guid><description>&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When executing SQL in a PostgreSQL database, &lt;code&gt;ORDER BY LIMIT 10&lt;/code&gt; runs slower than &lt;code&gt;ORDER BY LIMIT 100&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Execution Plan Analysis
 &lt;div id="execution-plan-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plan-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; item_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;name&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abcdefg&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;item&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..1522.66 rows=10 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..158007.45 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The main table does not use the &lt;code&gt;column2&lt;/code&gt; index. Instead it uses an &lt;strong&gt;Index Scan Backward&lt;/strong&gt; on the &lt;code&gt;column3&lt;/code&gt; sort index. The scan cost for the index is very high, yet the final cost looks low. Actual execution takes 9 seconds.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Analysis
 &lt;div id="problem-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When executing SQL in a PostgreSQL database, &lt;code&gt;ORDER BY LIMIT 10&lt;/code&gt; runs slower than &lt;code&gt;ORDER BY LIMIT 100&lt;/code&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Execution Plan Analysis
 &lt;div id="execution-plan-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#execution-plan-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; item_name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;name&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abcdefg&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;item&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..1522.66 rows=10 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..158007.45 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The main table does not use the &lt;code&gt;column2&lt;/code&gt; index. Instead it uses an &lt;strong&gt;Index Scan Backward&lt;/strong&gt; on the &lt;code&gt;column3&lt;/code&gt; sort index. The scan cost for the index is very high, yet the final cost looks low. Actual execution takes 9 seconds.&lt;/p&gt;
&lt;p&gt;Changing &lt;code&gt;LIMIT 10&lt;/code&gt; to &lt;code&gt;LIMIT 100&lt;/code&gt; yields a normal execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; cl.ITEM_NAME &lt;span style="color:#f92672"&gt;=&lt;/span&gt; RI.MANUAL_SIGN &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;manualSign&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;manualSign&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Limit (cost=2632.28..3162.78 rows=100 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Result (cost=2632.28..8138.87 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Sort (cost=2632.28..2634.87 rows=1038 width=474)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort Key: ri.column3 DESC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using idx_cri_column2 on tablelzl1 ri (cost=0.43..2592.61 rows=1038 width=474)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((column1)::text = &amp;#39;AAAA&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(10 rows)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The subquery plan remains unchanged. The main table now uses the &lt;code&gt;column2&lt;/code&gt; single-column index, fetches rows, sorts, then applies LIMIT — execution is extremely fast.&lt;/p&gt;
&lt;p&gt;This is not just about LIMIT values — changing only the &lt;code&gt;column2&lt;/code&gt; value in the original SQL can also produce a normal plan. In practice, only a few specific &lt;code&gt;column2&lt;/code&gt; values trigger the abnormal plan.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Execution plan comparison:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;column2&lt;/em&gt; is a filter column, &lt;em&gt;column3&lt;/em&gt; is a sort column. The two plans choose different indexes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Abnormal &lt;code&gt;LIMIT 10&lt;/code&gt; plan:&lt;/strong&gt; &lt;em&gt;Backward scan sort-column index → fetch rows → limit&lt;/em&gt;. No extra sort needed; scanning backward, it can stop as soon as it finds enough rows matching the LIMIT. The estimated cost of scanning the sort-column index is very high, but the top-level LIMIT cost estimate is very low.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Normal &lt;code&gt;LIMIT 100&lt;/code&gt; plan:&lt;/strong&gt; &lt;em&gt;Access filter-column index → fetch rows → sort by sort column → limit&lt;/em&gt;. Because sorting is required, all matching index entries must be retrieved. The filter-column index scan itself has a low cost estimate.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So the key issue is: the optimizer &lt;strong&gt;underestimates the cost of a partial backward scan on the sort index&lt;/strong&gt;.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Actual Execution
 &lt;div id="actual-execution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#actual-execution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Let&amp;rsquo;s look at &lt;code&gt;explain (analyze,buffers)&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..1521.93 rows=10 width=990) (actual time=23.311..8122.516 rows=10 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=861100 read=42985 dirtied=7
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=6741.003
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..157932.45 rows=1038 width=990) (actual time=23.309..8122.505 rows=10 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rows Removed by Filter: 1521796
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=861100 read=42985 dirtied=7
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=6741.003
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18) (actual time=0.005..0.005 rows=0 loops=10)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=121 read=28
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=1.476
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: 2.314 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: 8122.658 ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=2632.28..3162.78 rows=100 width=990) (actual time=150.101..150.122 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=700 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Result (cost=2632.28..8138.87 rows=1038 width=990) (actual time=150.100..150.119 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=700 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Sort (cost=2632.28..2634.87 rows=1038 width=474) (actual time=150.072..150.073 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort Key: ri.column3 DESC
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort Method: quicksort Memory: 30kB
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=694 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using idx_cri_column2 on tablelzl1 ri (cost=0.43..2592.61 rows=1038 width=474) (actual time=0.418..149.973 rows=14 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((column1)::text = &amp;#39;AAAA&amp;#39;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rows Removed by Filter: 1218
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=691 read=274
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; I/O Timings: read=146.903
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18) (actual time=0.002..0.002 rows=0 loops=14)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit=6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: 0.334 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: 150.257 ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;LIMIT 10&lt;/code&gt; plan executes in 8 seconds: shared hit=861,100, disk read=42,985, &lt;strong&gt;1,521,796 rows removed by filter&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;LIMIT 100&lt;/code&gt; plan executes in 0.15 seconds: shared hit=694, read=274, 1,218 rows removed.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;LIMIT 10&lt;/code&gt; plan is clearly abnormal — it &lt;strong&gt;reads far too many rows before finding qualifying ones&lt;/strong&gt;, which is why the query is slow.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Statistics Analysis
 &lt;div id="statistics-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#statistics-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The estimated cost is low, but the actual scan touches many index rows. First, check whether the statistics are accurate.&lt;/p&gt;
&lt;p&gt;Table statistics:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)phmamp][10-30.15:01:26]M=# select relpages,reltuples::bigint from pg_class where relname=&amp;#39;tablelzl1&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relpages | reltuples 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 91172 | 2280874 -- roughly matches actual count&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Column statistics:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[phmampopr@cnsz381785:7169/(rasesql)phmamp][10-27.17:08:48]M=&amp;gt; select * from pg_stats where tablename=&amp;#39;tablelzl1&amp;#39; and attname=&amp;#39;column2&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-[ RECORD 1 ]----------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;schemaname | public
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tablename | tablelzl1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;attname | column2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;inherited | f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;null_frac | 0
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;avg_width | 18
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;n_distinct | -0.11990886
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_vals | {applyno20231112,DY20190723006650,DY20200102012899,DY20180827000557,DY20190524001304,DY20190529001885,DY20190728002359}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_freqs | {0.0005,0.00026666667,0.00023333334,0.0002,0.0002,0.0002,0.0002}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;histogram_bounds | {CULZF0000121605605,DSNEW0000126854232,DSNEW0000137652871,DY20160516001057,DY20161104005509,DY20170306002677,DY20170703010428,DY20170928013517,DY20180410007383,DY20180615002936,DY20180
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;correlation | 0.3131596
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elems | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;most_common_elem_freqs | [null]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;elem_count_histogram | [null]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The value &lt;code&gt;applyno20231112&lt;/code&gt; happens to be the top &lt;code&gt;most_common_vals&lt;/code&gt;, with an estimated frequency of 0.0005. Multiplying: 2,280,874 × 0.0005 = 1,140, which is close to the real count of 1,232.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)phmamp][10-30.15:05:28]M=# select count(*) from tablelzl1 where column2 = &amp;#39;applyno20231112&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; count 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1232&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Statistics are accurate. Running &lt;code&gt;ANALYZE&lt;/code&gt; to recollect statistics would not fix this.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Effect of Uneven Data Distribution
 &lt;div id="the-effect-of-uneven-data-distribution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-effect-of-uneven-data-distribution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Using the current statistics, the estimated number of matching rows is ~1,140. On average, finding the first matching row through the sort-column index would require scanning 2,280,874 / 1,140 ≈ 2,000 index entries. For 10 rows, about 20,000 entries; for 100 rows, about 200,000 entries.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s disable sort and force the &lt;code&gt;LIMIT 100&lt;/code&gt; statement to use the sort-column index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_sort&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--limit 100 execution plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.43..15222.69 rows=100 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..158007.45 rows=1038 width=990)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; SubPlan 1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan using uk_tablelzl2_ii on tablelzl2 cl (cost=0.27..5.29 rows=1 width=18)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Index Cond: (((item_no)::text = &amp;#39;manualSign&amp;#39;::text) AND ((item_name)::text = (ri.manual_sign)::text))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;LIMIT 10&lt;/code&gt; becomes &lt;code&gt;LIMIT 100&lt;/code&gt;, the cost jumps from 1522.66 to 15222.69 — roughly a ×10 multiplication. The &lt;code&gt;LIMIT 100&lt;/code&gt; cost of 15222.69 now exceeds the filter-column index plan&amp;rsquo;s cost of 3162.78, so the optimizer switches indexes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The above estimates all assume data is evenly scattered across the sort-column index. In reality, the data could be at the very end (backward scan finds it quickly), or all concentrated in the first few leaf pages (requiring nearly a full index scan + fetch), making the cost extremely high.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The correlation between the two columns — how the data is distributed across the index — determines whether using the sort-column index is efficient.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s look at how many rows were actually scanned:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Index Scan Backward using idx_tablelzl1_column3 on tablelzl1 ri (cost=0.43..157932.45 rows=1038 width=990) (actual time=23.309..8122.505 rows=10 loops=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((column1)::text = &amp;#39;AAAA&amp;#39;::text) AND ((column2)::text = &amp;#39;applyno20231112&amp;#39;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Rows Removed by Filter: 1521796&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In reality, about &lt;strong&gt;1,521,796 rows&lt;/strong&gt; were scanned to find just 10 matching rows. The estimate was 20,000 — a &lt;strong&gt;76× discrepancy&lt;/strong&gt;!&lt;/p&gt;

&lt;h2 class="relative group"&gt;Trigger Conditions
 &lt;div id="trigger-conditions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#trigger-conditions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Must involve &lt;code&gt;WHERE&lt;/code&gt; + &lt;code&gt;ORDER BY&lt;/code&gt; + &lt;code&gt;LIMIT&lt;/code&gt; clauses&lt;/li&gt;
&lt;li&gt;Both the sort column and filter column must have indexes&lt;/li&gt;
&lt;li&gt;The LIMIT value is typically not very large&lt;/li&gt;
&lt;li&gt;Uneven data distribution&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Solution
 &lt;div id="solution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#solution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Rewrite the SQL: add an expression to prevent the &lt;code&gt;ORDER BY&lt;/code&gt; column from using its index.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; cl.ITEM_DESC &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tablelzl2 cl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; cl.ITEM_NAME &lt;span style="color:#f92672"&gt;=&lt;/span&gt; RI.MANUAL_SIGN &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; cl.ITEM_NO&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;manualSign&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;manualSign&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablelzl1 RI
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; RI.column1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;AAAA&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; RI.column2 &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;applyno20231112&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RI.column3 &lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;How Oracle Handles This
 &lt;div id="how-oracle-handles-this" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#how-oracle-handles-this" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Cost Estimation Differences in Execution Plans
 &lt;div id="cost-estimation-differences-in-execution-plans" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cost-estimation-differences-in-execution-plans" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;From the analysis above, the PostgreSQL execution plan&amp;rsquo;s cost looks unbalanced — the upper-level cost is lower than the inner-level cost, unlike Oracle&amp;rsquo;s hierarchical accumulation.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s run an experiment: a table containing only rows where &lt;code&gt;colname='x'&lt;/code&gt;, comparing how PostgreSQL and Oracle calculate costs:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)dbmgr][10-31.14:32:19]M=# explain select * from testlzl where col1=&amp;#39;x&amp;#39; limit 1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.00..0.02 rows=1 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on testlzl (cost=0.00..17747.20 rows=1048576 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((col1)::text = &amp;#39;x&amp;#39;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres@cnsz381785:7169/(rasesql)dbmgr][10-31.14:32:30]M=# explain select * from testlzl where col1=&amp;#39;xx&amp;#39; limit 1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Limit (cost=0.00..17747.20 rows=1 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on testlzl (cost=0.00..17747.20 rows=1 width=2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((col1)::text = &amp;#39;xx&amp;#39;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;col1='x'&lt;/code&gt;, the row is found immediately, but the LIMIT cost is not pushed down into the seq scan cost — the total cost is 17747.20, the same as scanning the whole table. The LIMIT cost is not pushed into the inner node&amp;rsquo;s cost, but the &lt;strong&gt;rows estimate is&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s see how Oracle handles the same case:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from dbmgr.testlzl where a=&amp;#39;x&amp;#39; and rownum&amp;lt;=1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;1 row selected.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 2045386539
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 2 | 2 (0)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 2 | TABLE ACCESS FULL| TESTLZL | 1 | 2 | 2 (0)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified by operation id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1 - filter(ROWNUM&amp;lt;=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2 - filter(&amp;#34;A&amp;#34;=&amp;#39;x&amp;#39;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from dbmgr.testlzl where a=&amp;#39;xx&amp;#39; and rownum&amp;lt;=1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;no rows selected
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 2045386539
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 2 | 302 (2)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 2 | TABLE ACCESS FULL| TESTLZL | 1 | 2 | 302 (2)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified by operation id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 1 - filter(ROWNUM&amp;lt;=1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 2 - filter(&amp;#34;A&amp;#34;=&amp;#39;xx&amp;#39;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In Oracle, when &lt;code&gt;a='x'&lt;/code&gt; is found immediately, the STOPKEY cost is pushed into the inner node — cost is only 2. When the data doesn&amp;rsquo;t exist (&lt;code&gt;a='xx'&lt;/code&gt;), the full scan cost is 302.&lt;/p&gt;
&lt;p&gt;This is an important difference between Oracle and PostgreSQL cost calculation:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In Oracle, the outer node cost is always ≥ the inner node cost; in PostgreSQL, this is not guaranteed.&lt;/li&gt;
&lt;li&gt;Oracle&amp;rsquo;s inner node cost incorporates outer operators (e.g., STOPKEY); PostgreSQL does not — it gives the full cost of the child path.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Oracle and Uneven Data Distribution
 &lt;div id="oracle-and-uneven-data-distribution" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oracle-and-uneven-data-distribution" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Knowing the principle, we can reproduce the issue by placing data at the beginning of the sort index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl(a char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,b char(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert bulk data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; i &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;100000&lt;/span&gt; loop
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; loop;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;end&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert special data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;aaaa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aaaa&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;zzzz&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zzzz&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_a &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_b &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(b);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Collect statistics
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;EXEC&lt;/span&gt; DBMS_STATS.GATHER_TABLE_STATS(OWNNAME&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;SYS&amp;#39;&lt;/span&gt;,TABNAME&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;TLZL&amp;#39;&lt;/span&gt;,estimate_percent &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;, degree&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,METHOD_OPT&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;FOR ALL COLUMNS SIZE AUTO&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;cascade&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#75715e"&gt;/*+ index(tlzl idx_a)*/&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;aaaa&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#75715e"&gt;/*+ index(tlzl idx_a)*/&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zzzz&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b=&amp;#39;aaaa&amp;#39; order by a) where rownum&amp;lt;=1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 3674066029
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 2 | VIEW | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 3 | TABLE ACCESS BY INDEX ROWID| TLZL | 1 | 202 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 4 | INDEX FULL SCAN | IDX_A | 98830 | | 779 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;SYS@t8icss1&amp;gt; select * from (select /*+ index(tlzl idx_a)*/* from tlzl where b=&amp;#39;zzzz&amp;#39; order by a) where rownum&amp;lt;=1; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: 3674066029
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 0 | SELECT STATEMENT | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 1 | COUNT STOPKEY | | | | | |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 2 | VIEW | | 1 | 204 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;|* 3 | TABLE ACCESS BY INDEX ROWID| TLZL | 1 | 202 | 2210 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;| 4 | INDEX FULL SCAN | IDX_A | 98830 | | 779 (1)| 00:00:01 |
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------------------------------------------------------------------------------------&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Oracle&amp;rsquo;s optimizer has the same limitation — it doesn&amp;rsquo;t know where the data actually sits within the index. Whether the data is at the first or last position in the index, the estimated cost is the same.&lt;/p&gt;
&lt;p&gt;However, Oracle provides more tools to address this: extended statistics, Automatic Column Group Detection, plan baselines, etc.&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="http://www.postgres.cn/v2/news/viewone/1/717" target="_blank" rel="noreferrer"&gt;http://www.postgres.cn/v2/news/viewone/1/717&lt;/a&gt;
&lt;a href="https://oracle-base.com/articles/12c/automatic-column-group-detection-extended-statistics-12cr1" target="_blank" rel="noreferrer"&gt;https://oracle-base.com/articles/12c/automatic-column-group-detection-extended-statistics-12cr1&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>People from Another World</title><link>https://lastdba.com/en/2024/08/12/people-from-another-world/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/people-from-another-world/</guid><description>&lt;p&gt;​


&lt;img src="https://lastdba.com/img/csdn/13ea4ca5d98b.png" alt="" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Vacation
 &lt;div id="vacation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vacation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I took a long vacation and went back to my hometown before my leave days expired — not just to escape the busyness of work, but also to visit my grandparents. For working people like us, going back to our hometown is really difficult. If it&amp;rsquo;s just a weekend trip, we&amp;rsquo;d only get one day of rest before having to head back — too exhausting. We don&amp;rsquo;t get many vacation days to begin with, and when we do, most people think about driving out to see some scenery or just staying home for a few days doing nothing. No one usually thinks of using their precious leave to visit elderly relatives back home.&lt;/p&gt;</description><content:encoded>&lt;p&gt;​


&lt;img src="https://lastdba.com/img/csdn/13ea4ca5d98b.png" alt="" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Vacation
 &lt;div id="vacation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vacation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I took a long vacation and went back to my hometown before my leave days expired — not just to escape the busyness of work, but also to visit my grandparents. For working people like us, going back to our hometown is really difficult. If it&amp;rsquo;s just a weekend trip, we&amp;rsquo;d only get one day of rest before having to head back — too exhausting. We don&amp;rsquo;t get many vacation days to begin with, and when we do, most people think about driving out to see some scenery or just staying home for a few days doing nothing. No one usually thinks of using their precious leave to visit elderly relatives back home.&lt;/p&gt;
&lt;p&gt;Ironically, the leave I used to visit my grandparents was childcare leave, not some kind of &amp;ldquo;eldercare leave.&amp;rdquo; It seems the world doesn&amp;rsquo;t have such a thing as &amp;ldquo;eldercare leave&amp;rdquo; — only family visit leave. Although there is legally a &amp;ldquo;family visit leave&amp;rdquo; provision, never mind that it isn&amp;rsquo;t specifically designed for visiting the elderly — just look at those impossibly long qualifiers. For the vast majority of people, family visit leave essentially doesn&amp;rsquo;t exist.&lt;/p&gt;
&lt;p&gt;Using childcare leave not to care for children but to visit the elderly — I imagine most people wouldn&amp;rsquo;t do that. Am I the only oddball who would? Well, at least this is how I see it: raising children and caring for the elderly are equally important; we shouldn&amp;rsquo;t favor one over the other. Society and working people tend to prioritize the former. Regardless, I still wanted to go back and spend time with them, to see what the old couple does every day, how they live, whether they face any difficulties, and how they cope with those difficulties. So I went back, alone.&lt;/p&gt;
&lt;p&gt;The end of the road.
The place where the old couple lives is where I grew up. It&amp;rsquo;s quite hidden — you have to turn off the main road onto a mountain path and go a long way, all the way to the end. It feels like a place cut off from the world. When you arrive there, it&amp;rsquo;s as if all connection to the outside world ceases to exist.&lt;/p&gt;
&lt;p&gt;It&amp;rsquo;s not actually my ancestral hometown, but I prefer to call it that. It&amp;rsquo;s a mining area. Because it&amp;rsquo;s built into the mountainside, the mine has a striking three-dimensional quality — so much so that I&amp;rsquo;m in awe of the predecessors who designed it. I still don&amp;rsquo;t quite know how to describe the administrative level of this place. It&amp;rsquo;s not a village, not a town — more modern than a village but smaller than a town. When I was little I thought this place was huge; now I realize you can walk through the entire mining area in just ten minutes.&lt;/p&gt;
&lt;p&gt;The whole place relies on coal mining as its economic pillar. It once prospered, but now it has declined significantly. There are still miners who go underground, but in the living quarters, you no longer see young people like me. The mine has an elementary school; when I attended, there were about 70 students per grade. Now there are only seven.&lt;/p&gt;
&lt;p&gt;The childhood memories there are overwhelmingly strong — like a paradise, a sanctuary untouched by worldly strife, another world. Being far from modern society, you only need the basics to get by, and time seems to pass slowly. A place like this is indeed very suitable for retirement — and indeed, there are many elderly people here.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Food
 &lt;div id="food" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#food" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;When I was little, the market was fairly lively. I remember the poultry vendor would submerge whole chickens in something black and tar-like before plucking them — the poultry area was always filthy. Now the market no longer sells fresh meat; you can only buy vegetables grown by nearby farmers. If you want fresh meat, you have to go to the village market day or take a bus into the city.&lt;/p&gt;
&lt;p&gt;Because the old couple is extremely frugal, I was initially worried they lived too simply — maybe just rice and vegetables every day. When I went back this time, I didn&amp;rsquo;t tell them exactly when I&amp;rsquo;d arrive. When I got home, I found they had even bought braised duck — I was quite relieved. My return made them very happy, and with just the three of us, they made five or six dishes every day. I even started to wonder if I was there to keep them company or to cause them trouble.&lt;/p&gt;
&lt;p&gt;Maybe I&amp;rsquo;ve been spoiled by the rich flavors of the outside world. At first, when they asked, &amp;ldquo;Is this dish good?&amp;rdquo; I couldn&amp;rsquo;t bring myself to say what I really thought. At moments like this, I recall a line from some book: &amp;ldquo;Humans cannot directly judge the value of something; only by comparing it to something else do they know its worth.&amp;rdquo; The same goes for food. When you taste something for the first time, you don&amp;rsquo;t actually know if it&amp;rsquo;s good or not. If you do know, it must be because you&amp;rsquo;ve already compared it to something in your memory. When I was little and first tried hotpot, adults would always ask, &amp;ldquo;Is this hotpot good?&amp;rdquo; To be honest, I had no idea — I didn&amp;rsquo;t even know what &amp;ldquo;good&amp;rdquo; was supposed to taste like. I just ate.&lt;/p&gt;
&lt;p&gt;Now my palate has indeed grown more demanding, but here, I wanted to reset everything, to press that &amp;ldquo;restore factory settings&amp;rdquo; button. I can say with complete sincerity: what they cook is delicious.&lt;/p&gt;
&lt;p&gt;One more thing: at one point I offered to wash the dishes. They said, &amp;ldquo;Put them down, you don&amp;rsquo;t know how — we wash dishes with rice water. You wouldn&amp;rsquo;t get them clean. Dish soap is full of chemicals; we don&amp;rsquo;t use that stuff.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Traditional Chinese Medicine
 &lt;div id="traditional-chinese-medicine" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#traditional-chinese-medicine" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;On this trip, I discovered a fact: elderly people are extremely dependent on medication. Their medicine cabinets are always stuffed with all kinds of drugs — Western medicine, Chinese medicine, cold medicine, anti-inflammatory drugs, ointments, supplements — a whole pile. Whenever they feel something wrong with their body, they reach for whatever they think will help. During my visit, my rhinitis flared up (an old problem of mine) — nonstop sneezing and runny nose. They kept urging me to take cold medicine, recommending Ganmaoling or cephalosporin. I must have said at least ten times: &amp;ldquo;It&amp;rsquo;s rhinitis, not a cold.&amp;rdquo; They, of course, had no idea how to treat this kind of rhinitis, so they just kept urging me to take cold medicine.&lt;/p&gt;
&lt;p&gt;One day I took them into the city. Besides a supermarket run, the more important errand was buying medicine. Buying medicine meant both Chinese and Western.&lt;/p&gt;
&lt;p&gt;The Chinese medicine was purchased at a Yunnan herbal shop. The shop owner had a buzz cut, a black T-shirt, a silver necklace, and a brown beaded bracelet — he looked quite burly. With his tough-guy appearance, I didn&amp;rsquo;t even dare to speak loudly to him, though my grandfather didn&amp;rsquo;t seem to notice any of that. The shop was mostly filled with herbs I couldn&amp;rsquo;t name, sold by weight, quite expensive — not your typical Chinese medicine. Clearly, my grandfather was a regular customer; the owner knew him. But it seemed my grandfather didn&amp;rsquo;t really know how to pick herbs either: &amp;ldquo;Boss, just weigh me 300 yuan&amp;rsquo;s worth based on my health condition.&amp;rdquo; So the owner grabbed a bit from here, a bit from there, and finally ground everything into powder.&lt;/p&gt;
&lt;p&gt;Maybe I&amp;rsquo;ve studied too much — I&amp;rsquo;ve always been skeptical of traditional Chinese medicine, simply because I find it lacks convincing rationale. I was quite worried they&amp;rsquo;d get scammed; these herbal medicine dealers prey specifically on the elderly. But my grandfather said: &amp;ldquo;Before, your grandmother had constant headaches. After taking this medicine, the headaches stopped.&amp;rdquo; So it seemed to work. Western medicine is indeed far too unfriendly to the elderly.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Western Medicine
 &lt;div id="western-medicine" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#western-medicine" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;After buying the Chinese medicine, we walked a long way to a pharmacy to buy Western medicine. That pharmacy might be one of the few they know.&lt;/p&gt;
&lt;p&gt;The vast majority of drugs in that pharmacy couldn&amp;rsquo;t be reimbursed. Only in a tiny, shabby room deep inside were a small selection of reimbursable drugs. I looked around and barely recognized any of them — all named with chemical formulas, completely incomprehensible. Only things like Ganmaoling and loquat syrup were familiar. My grandfather fell into the same difficulty choosing. He recognized cephalosporin, but the pharmacy girl said they didn&amp;rsquo;t have it. He got a bit angry and said to her: &amp;ldquo;Don&amp;rsquo;t you have any decent medicine?&amp;rdquo; (&amp;ldquo;Decent medicine&amp;rdquo;? I tried to parse what he meant.) The girl pulled out a red box of nicely packaged health supplements from somewhere. My grandfather couldn&amp;rsquo;t read the tiny text on the box, so he asked me to read it to him and tell him what it treated. I looked at it — the thing claimed to treat everything — so I didn&amp;rsquo;t read it and handed it back to the girl. In the end, they only picked up a few common cold and cough remedies.&lt;/p&gt;
&lt;p&gt;My grandfather repeatedly told me along the way that he gets 170 yuan of medical insurance reimbursement per year. I could tell he really, really wanted to spend that 170 yuan, to stockpile some medicine at home. That&amp;rsquo;s why he wanted to go to a Western pharmacy, and that&amp;rsquo;s why we walked all the way to this pharmacy that accepts insurance reimbursement.&lt;/p&gt;
&lt;p&gt;But there was some trouble at checkout. The cashier girl had looked unhappy from the start. She took the medicine and rattled off a bunch of things I didn&amp;rsquo;t understand — and my grandfather clearly didn&amp;rsquo;t either. The only thing we caught was: &amp;ldquo;These can&amp;rsquo;t be reimbursed.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The girl said: &amp;ldquo;There&amp;rsquo;s a threshold fee of 150 yuan for reimbursement, and you haven&amp;rsquo;t paid the threshold fee yet.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;My grandfather said: &amp;ldquo;Is the threshold fee like the 150-yuan bed fee hospitals used to charge?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The girl paused, then said impatiently: &amp;ldquo;Yes, yes, whatever you say is right.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;My grandfather got a bit angry: &amp;ldquo;Forget it, I don&amp;rsquo;t want them!&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I quickly asked the girl what exactly this threshold fee meant. Without a word, she pointed to a notice posted on the window — a table explaining the threshold fee. I couldn&amp;rsquo;t quite make sense of it either, but I understood that this threshold fee had to be paid. I thought about it — when I see a doctor, I just swipe my insurance card directly. What&amp;rsquo;s all this about reimbursement? I was even more confused.&lt;/p&gt;
&lt;p&gt;I said: &amp;ldquo;Can I use my insurance card?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The girl said: &amp;ldquo;Out-of-region cards won&amp;rsquo;t work.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;I said: &amp;ldquo;Can I just pay with Alipay?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Seeing me about to pay, my grandfather immediately stopped me: &amp;ldquo;There&amp;rsquo;s absolutely no way I&amp;rsquo;m letting you pay for this.&amp;rdquo; He pulled cash from his bag and paid. I understood — for me, a hundred-something yuan is nothing, but for them, it&amp;rsquo;s still money they&amp;rsquo;re reluctant to part with.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Technology
 &lt;div id="technology" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#technology" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;We marvel at how fast technology advances, always bringing new things that change our way of life and make it more convenient. Working people chase technology and immerse themselves in it. But for the elderly, technology is an entirely different story.&lt;/p&gt;
&lt;p&gt;In every aspect of elderly people&amp;rsquo;s lives, things involving technology are exceedingly rare. The most commonly used thing is a phone — a smartphone. They seem to have adapted well to the fast-paced entertainment of apps like Douyin (TikTok), and they also play with their phones watching short videos before bed. (What they use is probably not actual Douyin, but some other app with recommended short videos.)&lt;/p&gt;
&lt;p&gt;But that&amp;rsquo;s about the limit. They don&amp;rsquo;t really understand how phones work. For example, when they make phone calls — whether it&amp;rsquo;s my grandmother or grandfather — neither of them hangs up after finishing a call. It&amp;rsquo;s not that they don&amp;rsquo;t want to; they just don&amp;rsquo;t know where to find the hang-up button. If after a call they look at the phone and see a red hang-up button, they&amp;rsquo;ll press it. But if the screen is locked or the screen has changed, they won&amp;rsquo;t know how. My grandmother said to me: &amp;ldquo;Take a look at my phone — after I hang up, why does it keep making noise, keep making noise~&amp;rdquo; In fact, the call hadn&amp;rsquo;t ended at all; the screen had just gone dark and she thought it was hung up. If they call someone else, it&amp;rsquo;s fine, but if they call each other, it could be a disaster — because no one hangs up.&lt;/p&gt;
&lt;p&gt;And WeChat messages — they have absolutely no grasp of how WeChat messages work. They don&amp;rsquo;t know how to find someone&amp;rsquo;s chat window, don&amp;rsquo;t know who sent them a message, don&amp;rsquo;t know where messages go. Later, when we went traveling and I took photos for them, they asked me to put the photos on their phones (meaning in their photo albums). I had to operate both of their phones one by one to download photos from WeChat, making sure the downloaded photos were immediately visible in the album — otherwise, they&amp;rsquo;d never find them.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Traveling
 &lt;div id="traveling" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#traveling" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Taking the old couple out to travel was an important mission of this trip. I hadn&amp;rsquo;t originally planned it — I just wanted to empty my mind, breathe some fresh air, and stay there experiencing the slow passage of time. But they really enjoy going out. As soon as I arrived, my grandfather proactively suggested I could drive them somewhere for fun.&lt;/p&gt;
&lt;p&gt;We visited Zhu De&amp;rsquo;s Former Residence, Langzhong Ancient City, and Nanchong — two days and one night. Traveling with elderly people requires more consideration — they can&amp;rsquo;t sit in a car too long or walk too much. So we couldn&amp;rsquo;t really do that many things. But their philosophy of travel is different from ours; they lean more toward &amp;ldquo;checking in,&amp;rdquo; valuing the fact that they&amp;rsquo;ve &amp;ldquo;been here.&amp;rdquo; So they absolutely must take photos at landmark spots with the place name written on them~&lt;/p&gt;
&lt;p&gt;They also prefer crowded places over scenic spots with few people. In Langzhong, they clearly enjoyed being inside the ancient city — the bustling, noisy, lively atmosphere. They even video-called my aunt and shouted, &amp;ldquo;We&amp;rsquo;re in Langzhong!!&amp;rdquo; (with heavy emphasis), grinning ear to ear. Meanwhile, at White Pagoda Hill (you can drive up, very elderly-friendly), overlooking the panoramic view of Langzhong, I was immersed in a &amp;ldquo;what a view&amp;rdquo; moment. My grandmother looked for two minutes, took two photos, and that was it. I said, &amp;ldquo;Look at the scenery, it&amp;rsquo;s so beautiful — we came all the way up here.&amp;rdquo; She replied, &amp;ldquo;I already looked.&amp;rdquo;&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Shed
 &lt;div id="the-shed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-shed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Usually, my grandfather goes to the small park to watch others play cards, and my grandmother plays mahjong — no money involved.&lt;/p&gt;
&lt;p&gt;Besides cards, the place they spend the most time is the shed downstairs. A few discarded stools and chairs from various families are gathered under the shed, with a stove in the middle where they burn firewood in winter. Everyone upstairs and downstairs knows each other — all grandparent-aged, on very good terms with my grandparents. Neighbors will sit together and chat whenever they&amp;rsquo;re free. This is the most important social venue for the &amp;ldquo;neighborhood&amp;rdquo; (it&amp;rsquo;s not really a neighborhood, just two buildings).&lt;/p&gt;
&lt;p&gt;One evening, I sat in the shed listening to them talk. One grandmother said: &amp;ldquo;Your grandson is so good, taking leave to come back and keep the elderly company. We all say your grandson is wonderful.&amp;rdquo; I was a bit embarrassed, but thought — let this evaluation stay in the minds of these elders.&lt;/p&gt;
&lt;p&gt;One grandfather said: &amp;ldquo;I told xx&amp;rsquo;s family: come back once a month, spend time with the elderly, no need to give them money. What would they use that money for? They can get by just fine. But you&amp;rsquo;ve grown up and left, and without company for a long time, they feel lonely.&amp;rdquo; I thought, this old man really understands things. He added: &amp;ldquo;Once, so-and-so died. His whole family came back for the funeral. They brought him fruit and food — what&amp;rsquo;s the use? Did he get to eat any of it? To put it bluntly, that was all for show — for us to see. Once a person is gone, none of it matters.&amp;rdquo; Wow. This old man truly gets it.&lt;/p&gt;
&lt;p&gt;Coming back once a month is extremely difficult for working people — it&amp;rsquo;s just not realistic. Next year I won&amp;rsquo;t even have this childcare leave anymore. When will I come back next time? I can&amp;rsquo;t think of an answer. As we pass day after day in relentless busyness at work, how do the elderly pass their days — day after day of idleness and loneliness?&lt;/p&gt;

&lt;h2 class="relative group"&gt;Random Thoughts
 &lt;div id="random-thoughts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#random-thoughts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;How should the elderly face death? Every time they mention death, it&amp;rsquo;s always with a joking tone, but more than that, there&amp;rsquo;s resignation. How should a person face death? When I&amp;rsquo;m old and the deadline is approaching, how will I face it?&lt;/p&gt;
&lt;p&gt;While chatting in the shed, I couldn&amp;rsquo;t name any of these grandparent-aged people, but they all remembered me, knew how I grew up. My life seems to be a part of their lives, proof of my existence — even if this memory only lasts for a time. Yet that still has meaning, doesn&amp;rsquo;t it? The bonds of life exist in this way. There are billions of people in this world, and the vast, vast majority are fleeting meteors — remembered by no one, mentioned in no record.&lt;/p&gt;
&lt;p&gt;This society is remarkably unfriendly to the elderly. Social rules are too complex; they struggle to understand phones, healthcare, and insurance systems, so they can only huddle within their own social circles and flee from this incomprehensible society. At the same time, society has developed rapidly in recent years — children have mostly moved away for their own families and careers. For the elderly, they&amp;rsquo;re happy to see their children thriving, but the distance is vast, and mutual companionship is hard to come by. While society focuses on childcare and increasing birth benefits, no one pays attention to the issue of eldercare and companionship. I doubt there will ever be such a thing as &amp;ldquo;eldercare leave.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;My grandmother has poor hearing. Even with a hearing aid, it&amp;rsquo;s only slightly better. Often when I talk to her, she doesn&amp;rsquo;t follow at all and answers about something else entirely. But I can&amp;rsquo;t bring myself to raise my voice — it feels so rude. Leaning in close to speak makes her self-conscious. I suggested they come live with us in Chengdu, but she wouldn&amp;rsquo;t agree under any circumstances. I think maybe it&amp;rsquo;s because her hearing loss makes her afraid of communicating with people, timid in social situations. Only there, in the mining community, do the neighbors treat her well — it gives her a sense of security. One elderly woman said: &amp;ldquo;Being hard of hearing is good — it adds years to your life.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Written — April 2023&lt;/p&gt;
&lt;p&gt;The term &amp;ldquo;Popo&amp;rdquo; (a Chinese term for grandmother) still exists in my generation, but my children no longer say &amp;ldquo;Popo&amp;rdquo; — they say &amp;ldquo;Nainai&amp;rdquo; instead. Perhaps &amp;ldquo;Popo&amp;rdquo; is the last time this term will be used in our family line — may be the last call. Let it be preserved in this essay.&lt;/p&gt;</content:encoded></item><item><title>PG Error: attempted to delete invisible tuple</title><link>https://lastdba.com/en/2024/08/12/pg-error-attempted-to-delete-invisible-tuple/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/pg-error-attempted-to-delete-invisible-tuple/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL DELETE was failing with &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt;, but SELECT with the same conditions worked fine.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Results of full-table delete and full-table select:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;: attempted &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; invisible tuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: heap_delete, heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;511&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;050&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;231187&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;DELETE found an invisible tuple, but SELECT was fine.&lt;/p&gt;
&lt;p&gt;This seemed very strange at first. PG visibility is determined by the tuple&amp;rsquo;s xmin, xmax, cid and the snapshot&amp;rsquo;s xmin, xmax, xip_list. Although the transaction state and timing of the tuple deletion can affect visibility, if the table data is stable (no ongoing DML), any subsequent snapshot should yield a stable visibility set. There shouldn&amp;rsquo;t be a case where the current transaction&amp;rsquo;s visibility differs from others — DML transaction tuple visibility should be consistent. In other words, in this scenario, the SELECT snapshot and DELETE snapshot shouldn&amp;rsquo;t produce different results.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL DELETE was failing with &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt;, but SELECT with the same conditions worked fine.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Results of full-table delete and full-table select:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;55000&lt;/span&gt;: attempted &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; invisible tuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: heap_delete, heapam.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2500&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;511&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;050&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;231187&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;DELETE found an invisible tuple, but SELECT was fine.&lt;/p&gt;
&lt;p&gt;This seemed very strange at first. PG visibility is determined by the tuple&amp;rsquo;s xmin, xmax, cid and the snapshot&amp;rsquo;s xmin, xmax, xip_list. Although the transaction state and timing of the tuple deletion can affect visibility, if the table data is stable (no ongoing DML), any subsequent snapshot should yield a stable visibility set. There shouldn&amp;rsquo;t be a case where the current transaction&amp;rsquo;s visibility differs from others — DML transaction tuple visibility should be consistent. In other words, in this scenario, the SELECT snapshot and DELETE snapshot shouldn&amp;rsquo;t produce different results.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Analysis
 &lt;div id="analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Finding the Source Code
 &lt;div id="finding-the-source-code" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#finding-the-source-code" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Note the error location: &lt;code&gt;heapam.c:2500&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Find the source at &lt;code&gt;src/backend/access/heap/heapam.c&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Line 2500 is blank; nearby code is:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Before locking the buffer, pin the visibility map page if it appears to
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * be necessary. Since we haven&amp;#39;t got the lock yet, someone else might be
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * in the middle of changing this, so we&amp;#39;ll need to recheck after we have
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the lock.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;PageIsAllVisible&lt;/span&gt;(page))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;visibilitymap_pin&lt;/span&gt;(relation, block, &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;vmbuffer);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;LockBuffer&lt;/span&gt;(buffer, BUFFER_LOCK_EXCLUSIVE);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From the source, it&amp;rsquo;s trying to acquire a lock on the VM, so the problem appears related to the VM file.&lt;/p&gt;

&lt;h3 class="relative group"&gt;The VM File
 &lt;div id="the-vm-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-vm-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;What is the VM file?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The VM (Visibility Map) file exists to reduce the time vacuum spends scanning pages. If a page doesn&amp;rsquo;t need vacuuming, it can be skipped, greatly reducing the time spent finding pages that need cleaning. This is the original purpose of the VM file. (It&amp;rsquo;s also sometimes used by index-only scans, but that doesn&amp;rsquo;t apply here since we&amp;rsquo;re doing a sequential scan.)&lt;/p&gt;
&lt;p&gt;The VM file stores two pieces of information:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Whether all tuples on a page are visible. This means the page has no dead tuples needing vacuum.&lt;/li&gt;
&lt;li&gt;Whether all tuples on a page are frozen. This means vacuum freeze doesn&amp;rsquo;t need to visit this page.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1604296b876a.png" alt="Fig. 6.2. How the VM is used." /&gt;&lt;/p&gt;
&lt;p&gt;The VM helps vacuum find dead tuples while reducing the number of pages scanned. For example, in the diagram above (interdb ftw!), the first page contains no dead tuples, so vacuum can skip it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Finding the VM File&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Every table has a Visibility Map (VM) file (indexes don&amp;rsquo;t have VM files), stored alongside the table file. If a table&amp;rsquo;s filenode is &lt;code&gt;12345&lt;/code&gt;, its VM file is &lt;code&gt;12345_vm&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;First, cd to the data directory:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#75715e"&gt;# show data_directory;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; data_directory 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; /pg/pg6666/data&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Find the file storage location using the database OID and table OID:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,datname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_database &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; datname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;sdp&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;17075&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; sdp
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,relname &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzltab1&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Or:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17075&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Find the data file and VM:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ cd /pg/pg6666/data/base/17075
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ ll 17362*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;86761472&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; 17:43 &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;40960&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 17362_fsm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt; Nov &lt;span style="color:#ae81ff"&gt;14&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2022&lt;/span&gt; 17362_vm&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;The pg_visibility Extension
 &lt;div id="the-pg_visibility-extension" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pg_visibility-extension" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;pg_visibility&lt;/code&gt; provides page-level visibility information by inspecting VM files, and can detect VM corruption. Since the VM stores &amp;ldquo;are all tuples on this page visible; are all tuples on this page frozen&amp;rdquo; information, &lt;code&gt;pg_visibility&lt;/code&gt; can identify which pages are all-frozen and which are all-visible.&lt;/p&gt;
&lt;p&gt;pg_visibility extension reference: &lt;a href="https://www.postgresql.org/docs/current/pgvisibility.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/pgvisibility.html&lt;/a&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;Useful pg_visibility Functions
 &lt;div id="useful-pg_visibility-functions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#useful-pg_visibility-functions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;pg_visibility_map_summary()&lt;/strong&gt;: Shows the count of all-visible and all-frozen pages in the VM.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg_check_frozen()&lt;/strong&gt;: Returns rows where a tuple is not frozen but its page is marked all-frozen in the VM. If this function returns results, the VM file is corrupt.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg_check_visible()&lt;/strong&gt;: Returns rows where a tuple is not visible but its page is marked all-visible in the VM. If this function returns results, the VM file is corrupt.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pg_truncate_visibility_map()&lt;/strong&gt;: Clears the VM file. After clearing, the next vacuum on the table will scan all pages and rebuild the VM.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Repairing the VM File
 &lt;div id="repairing-the-vm-file" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#repairing-the-vm-file" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Check for VM corruption:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_visibility_map_summary(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_visibility_map_summary 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;472&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;472 all-visible pages, 0 all-frozen pages.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_frozen(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_frozen 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_visible(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_visible 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;6839&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;6839&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;7296&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1423&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;pg_check_visible()&lt;/code&gt; returning results means &lt;strong&gt;the VM is corrupted&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Now use &lt;code&gt;pg_truncate_visibility_map()&lt;/code&gt; to clear the VM:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_truncate_visibility_map(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_truncate_visibility_map 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;On disk, you can see the VM was cleared:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ll 17362*
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;86761472&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; 10:39 &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;40960&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; 21:09 17362_fsm
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-rw------- &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; postgres postgres &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; Jun &lt;span style="color:#ae81ff"&gt;27&lt;/span&gt; 18:18 17362_vm&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now verify by vacuuming the table to regenerate the VM file and check it&amp;rsquo;s not corrupted:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;VACUUM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;3692&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;402&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;692&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;q
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; ll &lt;span style="color:#ae81ff"&gt;17362&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 86761472 Jun 28 03:37 17362
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 40960 Jun 9 21:09 17362_fsm
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;rw&lt;span style="color:#75715e"&gt;------- 1 postgres postgres 8192 Jun 28 10:21 17362_vm&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After manual vacuum, the VM was regenerated correctly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_visible(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_visible 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_check_frozen(&lt;span style="color:#e6db74"&gt;&amp;#39;lzltab1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_check_frozen 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Both checks return empty — VM file is healthy. Repair complete.&lt;/p&gt;
&lt;p&gt;Finally, re-run the SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;##&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzltab1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DELETE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;229766&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;DELETE executes normally. Problem resolved.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Checking the Entire Database for VM Corruption
 &lt;div id="checking-the-entire-database-for-vm-corruption" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#checking-the-entire-database-for-vm-corruption" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Although we fixed one corrupted VM file, we should check the entire database for other VM corruption (requires the &lt;code&gt;pg_visibility&lt;/code&gt; extension installed):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; oid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; relname
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_class
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; relkind &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;r&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;m&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;t&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;EXISTS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_check_visible(oid))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXISTS&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; pg_check_frozen(oid)));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If results are returned, there&amp;rsquo;s VM corruption. Use &lt;code&gt;pg_truncate_visibility_map()&lt;/code&gt; to clear the VM, then vacuum to regenerate it, as shown above.&lt;/p&gt;
&lt;p&gt;For versions before 9.6 (which lack the pg_visibility extension), you&amp;rsquo;d need to stop the database, manually delete the VM files, restart, then vacuum to regenerate them.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Does VM Corruption Happen?
 &lt;div id="why-does-vm-corruption-happen" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-does-vm-corruption-happen" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;We traced the issue step by step to VM file corruption, but why did it corrupt?&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;PostgreSQL bugs. PG has had some bugs causing VM corruption (see Visibility Map Problems wiki), but these were all before PG 9.6.1.&lt;/li&gt;
&lt;li&gt;Operating system or hardware issues.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Our version was PG13, so the cause can only be broadly attributed to OS or hardware problems.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Why Did SELECT Succeed But DELETE Fail?
 &lt;div id="why-did-select-succeed-but-delete-fail" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#why-did-select-succeed-but-delete-fail" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A full-table SELECT working while a full-table DELETE errors out seems bizarre. The root cause is VM file corruption.&lt;/p&gt;
&lt;p&gt;As mentioned, the VM file exists to speed up vacuum. Even though we weren&amp;rsquo;t running vacuum, the VM file still needs to be updated — DML operations always update (or at least check) the VM, while SELECT does not change VM state. So in this case, SELECT executed normally, but DELETE errored during VM processing.&lt;/p&gt;
&lt;p&gt;In our case, DELETE scanned the VM and found pages marked all-visible, but the VM was wrong — those pages still contained invisible tuples. This is exactly the &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt; error. Invisible tuples may have already been deleted, and trying to delete them again naturally errors out, violating transaction visibility rules.&lt;/p&gt;
&lt;p&gt;Additionally, index-only scans also use the VM file, so they would also be affected. However, this case involved a sequential scan, so SELECT was unaffected.&lt;/p&gt;

&lt;h2 class="relative group"&gt;VM Corruption Causing Incorrect Index-Only Scan Results
 &lt;div id="vm-corruption-causing-incorrect-index-only-scan-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vm-corruption-causing-incorrect-index-only-scan-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;As mentioned earlier, besides vacuum, index-only scans also use the VM file. Even though our case didn&amp;rsquo;t involve index-only scans, let&amp;rsquo;s dig deeper for completeness.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What Is an Index-Only Scan?
 &lt;div id="what-is-an-index-only-scan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-an-index-only-scan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;As the name suggests, an index-only scan accesses only the index structure to get results, without touching the table. Almost all relational databases support index-only scans because B+tree index structures store key values — if the query only needs key values, an index-only scan is possible.&lt;/p&gt;
&lt;p&gt;However, PostgreSQL&amp;rsquo;s transaction implementation differs significantly from other databases (Oracle, MySQL), giving its index-only scans some unique characteristics.&lt;/p&gt;
&lt;p&gt;PostgreSQL checks tuple visibility via xmin, xmax, and other information in tuple headers, but indexes don&amp;rsquo;t contain this information. This means PG&amp;rsquo;s index-only scans must visit data blocks to check visibility. This is where the VM comes in: since the VM stores all-visible and all-frozen information, pages marked as such don&amp;rsquo;t need visibility checks — the VM has already confirmed their visibility.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/63ed5a39f52d.png" alt="Fig. 7.7. How Index-Only Scans performs" /&gt;&lt;/p&gt;
&lt;p&gt;Another interdb diagram (interdb ftw!). When a query looks up tuples with keys 18 and 19: the page containing key=18 is marked all-visible in the VM, so accessing this tuple only requires the index page and VM file. The page containing key=19 is not marked all-visible, so the index-only scan still needs to visit the data page to check visibility.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index-Only Scan Returning Incorrect Results
 &lt;div id="index-only-scan-returning-incorrect-results" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-only-scan-returning-incorrect-results" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Because index-only scans consult the VM, and a corrupted VM stores wrong information — e.g., a page&amp;rsquo;s tuples aren&amp;rsquo;t all visible (some may have been deleted), but the page is still marked all-visible — the index-only scan skips the data page visibility check and directly returns index key values that should be invisible.&lt;/p&gt;
&lt;p&gt;You can set &lt;code&gt;enable_indexonlyscan=off&lt;/code&gt; to disable index-only scans and guarantee correct results. Or, as shown above, repair the VM file — which is probably the better choice.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The journey had some twists: at first glance the error seemed like a transaction visibility rule problem, which would have been serious — but it was actually much simpler.&lt;/p&gt;
&lt;p&gt;We traced the &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt; error to the source code, identified it as a VM issue, used the &lt;code&gt;pg_visibility&lt;/code&gt; extension to detect and fix the VM corruption, resolved the DELETE error, and finally explored the relationship between index-only scans and the VM.&lt;/p&gt;
&lt;p&gt;Key takeaways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;pg_visibility&lt;/code&gt; extension can read, check, and clear VM files&lt;/li&gt;
&lt;li&gt;Without VM information, vacuum will generate a new VM&lt;/li&gt;
&lt;li&gt;DML reads/updates VM files; SELECT does not (non-index-only-scan)&lt;/li&gt;
&lt;li&gt;The VM file exists to improve vacuum efficiency, and sometimes index-only scan efficiency&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;attempted to delete invisible tuple&lt;/code&gt; error warrants checking the VM file for corruption&lt;/li&gt;
&lt;li&gt;VM file corruption can cause DML failures and incorrect index-only scan results&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/13/pgvisibility.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/pgvisibility.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://wiki.postgresql.org/wiki/Visibility_Map_Problems" target="_blank" rel="noreferrer"&gt;https://wiki.postgresql.org/wiki/Visibility_Map_Problems&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql07.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql07.html&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Interview Questions - Comprehensive Collection</title><link>https://lastdba.com/en/2024/08/12/postgresql-interview-questions-comprehensive-collection/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/postgresql-interview-questions-comprehensive-collection/</guid><description>&lt;p&gt;Interview questions source: PostgreSQL Apprentice &lt;a href="https://mp.weixin.qq.com/s/DCmO1E31JAbec1M05y2_UQ" target="_blank" rel="noreferrer"&gt;PostgreSQL Interview Questions Collection&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Existing answers: Hehuyi_In &lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;Learning and Answering PostgreSQL Interview Questions&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;1. MVCC Implementation and Differences from Oracle
 &lt;div id="1-mvcc-implementation-and-differences-from-oracle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-mvcc-implementation-and-differences-from-oracle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ORACLE and MYSQL both use UNDO to implement multi-version concurrency control. Undo entries are recorded in &lt;strong&gt;additional&lt;/strong&gt; undo tablespaces. If the UNDO segment is insufficient, an ora-01555 error occurs.



&lt;img src="https://lastdba.com/img/csdn/fec3e1c0263f.png" alt="Insert image description here" /&gt;
&lt;a href="https://www.slideshare.net/AmitBhalla2/less10-undo-15946188" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/AmitBhalla2/less10-undo-15946188&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has no undo mechanism. To ensure transaction rollback, old tuples remain on the table. For example, an update inserts a new row while the old data stays in place. Tuple headers, clog, etc. determine which tuple version is valid. Visibility information in tuple headers includes xmin, xmax, cmin, cmax, infomask, and infomask2, stored in the tuple header.&lt;/p&gt;</description><content:encoded>&lt;p&gt;Interview questions source: PostgreSQL Apprentice &lt;a href="https://mp.weixin.qq.com/s/DCmO1E31JAbec1M05y2_UQ" target="_blank" rel="noreferrer"&gt;PostgreSQL Interview Questions Collection&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Existing answers: Hehuyi_In &lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;Learning and Answering PostgreSQL Interview Questions&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;1. MVCC Implementation and Differences from Oracle
 &lt;div id="1-mvcc-implementation-and-differences-from-oracle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1-mvcc-implementation-and-differences-from-oracle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ORACLE and MYSQL both use UNDO to implement multi-version concurrency control. Undo entries are recorded in &lt;strong&gt;additional&lt;/strong&gt; undo tablespaces. If the UNDO segment is insufficient, an ora-01555 error occurs.



&lt;img src="https://lastdba.com/img/csdn/fec3e1c0263f.png" alt="Insert image description here" /&gt;
&lt;a href="https://www.slideshare.net/AmitBhalla2/less10-undo-15946188" target="_blank" rel="noreferrer"&gt;https://www.slideshare.net/AmitBhalla2/less10-undo-15946188&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;PostgreSQL has no undo mechanism. To ensure transaction rollback, old tuples remain on the table. For example, an update inserts a new row while the old data stays in place. Tuple headers, clog, etc. determine which tuple version is valid. Visibility information in tuple headers includes xmin, xmax, cmin, cmax, infomask, and infomask2, stored in the tuple header.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f34dabdc091c.png" alt="Insert image description here" /&gt;
&lt;a href="https://www.interdb.jp/pg/pgsql05/03.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05/03.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Pros/cons: The undo approach requires extra undo space; space management is simpler. However, large transaction rollback is very troublesome since undo segments must be rolled back. The new-tuple approach makes large transaction rollback very fast, but this method creates dead tuples, requiring a vacuum mechanism to clean them. Vacuum freeze itself isn&amp;rsquo;t directly related to dead tuple cleanup (though both are vacuum processes); freeze prevents transaction ID wraparound.&lt;/p&gt;

&lt;h3 class="relative group"&gt;2. Why Table Bloat Occurs and Its Hazards
 &lt;div id="2-why-table-bloat-occurs-and-its-hazards" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#2-why-table-bloat-occurs-and-its-hazards" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Why table bloat?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As above, due to PostgreSQL&amp;rsquo;s unique MVCC mechanism, delete doesn&amp;rsquo;t truly remove tuples, and update equals delete+insert. Old tuples cannot be removed by DML statements, so space only &amp;ldquo;grows&amp;rdquo; without &amp;ldquo;cleaning&amp;rdquo; — this is table bloat. Vacuum is generally needed to clean dead tuples and mark space as available; or vacuum full rewrites the table for compaction.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Hazards of table bloat:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Excessive table space usage&lt;/li&gt;
&lt;li&gt;SQL performance degradation&lt;/li&gt;
&lt;li&gt;Large tables cause longer vacuum cleanup times; vacuum full blocking time also increases, though pg_repack can replace vacuum full to reduce blocking&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Handling table bloat:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Manual vacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Does not block queries or DML operations&lt;/li&gt;
&lt;li&gt;Does not immediately reclaim space, only marks it as available&lt;/li&gt;
&lt;li&gt;If the last page of a table has no tuples, that page gets truncated&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4bcffb429099.png" alt="Insert image description here" /&gt;
(&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;)&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Autovacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Autovacuum automatically invokes vacuum for concurrent cleanup as needed&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Manual vacuum full&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;8-level lock, blocks everything&lt;/li&gt;
&lt;li&gt;Table is completely rewritten; corresponding OS files are cleaned and rebuilt&lt;/li&gt;
&lt;li&gt;Rebuilds indexes, FSM (free space map), VM (visibility map)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5c9458f68c2e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;pg_repack and other manual table rebuilds&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;pg_repack only has a brief lock during the final table switch&lt;/li&gt;
&lt;li&gt;Other tools with data sync and switch capabilities&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Avoiding table bloat:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Generally, autovacuum handles table bloat, but cleanup may not proceed smoothly in some scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Autovacuum worker isn&amp;rsquo;t running&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Both &lt;code&gt;autovacuum&lt;/code&gt; and &lt;code&gt;track_counts&lt;/code&gt; must be enabled for autovacuum to work&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autovacuum_max_workers&lt;/code&gt; must be set high enough; multiple workers may be needed simultaneously&lt;/li&gt;
&lt;li&gt;Table hasn&amp;rsquo;t reached vacuum threshold — rows deleted/updated: threshold = &lt;code&gt;autovacuum_vacuum_threshold&lt;/code&gt; + &lt;code&gt;autovacuum_vacuum_scale_factor&lt;/code&gt; * tuples&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autovacuum_vacuum_insert_threshold&lt;/code&gt; and &lt;code&gt;autovacuum_vacuum_insert_scale_factor&lt;/code&gt; represent insert thresholds (same algorithm). Insert-triggered vacuum thresholds theoretically have little to do with bloat cleanup since inserts don&amp;rsquo;t generate dead tuples. However, to prevent wraparound issues from not being handled in time, pg13 added this parameter (reference: &lt;a href="https://www.cybertec-postgresql.com/en/postgresql-autovacuum-insert-only-tables/" target="_blank" rel="noreferrer"&gt;postgresql-autovacuum-insert-only-tables&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;autovacuum_naptime&lt;/code&gt; is the autovacuum launcher cycle. If set too large, &lt;code&gt;autovacuum_max_workers&lt;/code&gt; may be sufficient and tables may meet thresholds, but the launcher hasn&amp;rsquo;t woken workers&lt;/li&gt;
&lt;li&gt;&lt;code&gt;vacuum_defer_cleanup_age&lt;/code&gt; delays vacuum cleanup by N transactions (originally designed to alleviate standby query conflicts; since &lt;code&gt;hot_standby_feedback&lt;/code&gt; and replication slots exist, pg16 removed this parameter)&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Disable or adjust cost-based vacuuming to make autovacuum faster&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Cost-based vacuuming may be enabled to reduce vacuum&amp;rsquo;s IO impact. When vacuum/autovacuum reaches the cost limit, it sleeps for &lt;code&gt;autovacuum_vacuum_cost_delay&lt;/code&gt; (or &lt;code&gt;vacuum_cost_delay&lt;/code&gt;) milliseconds. &lt;code&gt;vacuum_cost_delay&lt;/code&gt; defaults to 0 (disabling cost-based vacuuming); &lt;code&gt;autovacuum_vacuum_cost_delay&lt;/code&gt; at -1 means using the &lt;code&gt;vacuum_cost_delay&lt;/code&gt; setting. Disable delay or reduce the delay value&lt;/li&gt;
&lt;li&gt;If cost-based vacuuming is enabled, reasonably increase &lt;code&gt;vacuum_cost_limit&lt;/code&gt; trigger threshold and reduce the &lt;code&gt;vacuum_cost_page_dirty&lt;/code&gt;, &lt;code&gt;vacuum_cost_page_miss&lt;/code&gt;, &lt;code&gt;vacuum_cost_page_hit&lt;/code&gt; values that count toward the limit&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Active transactions preventing vacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Business long transactions not finished. Application-side transactions shouldn&amp;rsquo;t run too long; database-side can kill sessions: 1) manual kill 2) set &lt;code&gt;idle_in_transaction_session_timeout&lt;/code&gt; to limit idle time 3) set &lt;code&gt;old_snapshot_threshold&lt;/code&gt; to limit SQL execution (not recommended before PG14)&lt;/li&gt;
&lt;li&gt;Unclosed cursors&lt;/li&gt;
&lt;li&gt;&lt;code&gt;hot_standby_feedback&lt;/code&gt; enabled: primary records catalog_xmin, standby long queries prevent primary cleanup&lt;/li&gt;
&lt;li&gt;Remove unused replication slots&lt;/li&gt;
&lt;li&gt;Orphan transactions. Prepared transactions are explicit 2PC transactions inside PG. If a prepared transaction is opened but not completed, and prepared transactions are unrelated to sessions, orphan transactions block indefinitely&lt;/li&gt;
&lt;li&gt;pg_dump logical backup opens implicit repeatable read isolation level; transaction not finished&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Performance aspects&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;maintenance_work_mem&lt;/code&gt; is memory for maintenance operations like vacuum; default 64MB can be increased. Or use &lt;code&gt;autovacuum_work_mem&lt;/code&gt; separately for autovacuum workers; default -1 means using &lt;code&gt;maintenance_work_mem&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Large table vacuum is especially slow; since vacuum can&amp;rsquo;t parallelize on the same table, convert large tables to partitioned tables so vacuum can run in parallel across partitions&lt;/li&gt;
&lt;li&gt;Good IO system&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Adjust per-table autovacuum parameters&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Global autovacuum settings may not suit certain business tables; adjust per-table autovacuum parameters to increase vacuum trigger probability&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Manual vacuum&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;Autovacuum is generally unpredictable; for special business tables, manual vacuum&lt;/li&gt;
&lt;li&gt;Run manual vacuum during low-traffic periods, optionally with freeze and analyze&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The above handles 99.99% of table bloat problems. One type of bloat is harder to address: &lt;strong&gt;with cost-based vacuuming disabled, autovacuum dead tuple cleanup speed cannot keep up with generation speed&lt;/strong&gt;. Essentially, too many concurrent update (or insert+delete) transactions mean this round of vacuum hasn&amp;rsquo;t finished cleaning available space before massive updates generate new space and dead tuples, causing continuous bloat. Solutions:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Convert to partitioned tables for vacuum parallelism (only meaningful if updates are distributed across partitions)&lt;/li&gt;
&lt;li&gt;Run vacuum full or pg_repack during off-peak hours to thoroughly clean table holes&lt;/li&gt;
&lt;li&gt;24/7 high-concurrency tables are unlikely; if they exist, restructure to multi-table writes or move to caching systems like Redis&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247485791&amp;amp;idx=1&amp;amp;sn=24ef88bd19d923d60fdf1a8969577fb0&amp;amp;chksm=fa66216ecd11a8789ee0a9a4b7e850d98086bf3ae542aad814788bd8f262dd675be92fa7c5db&amp;amp;mpshare=1&amp;amp;scene=1&amp;amp;srcid=0204hKVlhPOQa19uoxv7u3Ch&amp;amp;sharer_sharetime=1675514289096&amp;amp;sharer_shareid=1a32625a0cee9a1f3987aa62eea3fa03&amp;amp;exportkey=n_ChQIAhIQQ4H0Z5qjGf21zcqa8OvAKxKZAgIE97dBBAEAAAAAAMSqIP5cR5AAAAAOpnltbLcz9gKNyK89dVj0uMOj41SOhYI%2BA5Y3sbSQytf8OotyHqqED8OFC4Tealz7gt91%2FbaCaExVHDNExUGj%2FFrrrwQo6a3qGtJdUptL6vyG2pb9G0NKzNyuv1JbQq%2FLbX9LgTeCARhtml2oCiD%2FLpZJmHpbgRccjrjZCVmQ6oCACKTTSh1P2mfSJbPk7MwCYzdshC3CxYaXemFbwoL9u9tM2H36%2FYBpOLW4wJiSI54CgHscZ%2FeSZfNwaHsn99iojWcG11b204NEjkMmpFgKOq%2F%2FJDMJu0ZwZaQRaLfoLZ5H%2FOgmJOeUQMrp%2Bc7A7UROn7%2BWTGJct6i3l9jJd44OTjyu&amp;amp;acctmode=0&amp;amp;pass_ticket=UiziakVvQcg3ztgfB%2Bovewae4j0ijakENPH%2BRT8lyhXyARWs5hjeT%2FDsPN2ithp8%2B5Wqbk2ySDdewyfjSC2BMg%3D%3D&amp;amp;wx_header=0#rd" target="_blank" rel="noreferrer"&gt;Unveiling the Mystery of Table Bloat&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql06.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql06.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/routine-vacuuming.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/routine-vacuuming.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/runtime-config-autovacuum.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/runtime-config-autovacuum.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/runtime-config-resource.html#GUC-VACUUM-COST-DELAY" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/runtime-config-resource.html#GUC-VACUUM-COST-DELAY&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;3. Long Transaction Hazards and How to Trace Them
 &lt;div id="3-long-transaction-hazards-and-how-to-trace-them" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#3-long-transaction-hazards-and-how-to-trace-them" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Regular queries don&amp;rsquo;t generate transaction IDs but virtual transaction IDs (vxid). Virtual transaction IDs consist of backendID and a backend-local counter, unrelated to transaction ID (XID). However, although queries don&amp;rsquo;t generate transaction IDs, they hold snapshots for visibility checks. Snapshots contain tuple xmin and other information.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9b24ddaad8e7.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://www.interdb.jp/pg/pgsql05/05.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql05/05.html&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;So long transaction issues involve both DML and query statements, though their lock types differ.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Long transaction hazards:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Blocks vacuum cleanup, causing table bloat, excessive space usage, and SQL performance degradation&lt;/li&gt;
&lt;li&gt;Blocks other lock requests; e.g., DDL must check for long transactions before execution, otherwise long waits for higher-level locks cause lock escalation&lt;/li&gt;
&lt;li&gt;Long transactions cause create index concurrently to fail, leaving invalid indexes&lt;/li&gt;
&lt;li&gt;Occupies connection pool (though mainly a long-connection issue)&lt;/li&gt;
&lt;li&gt;Logical decoding data spilling to disk causing replication lag, also related to large transactions&lt;/li&gt;
&lt;li&gt;A long transaction with a savepoint subtransaction can cause query performance cliffs (reference: &lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;Why we spent the last month eliminating PostgreSQL subtransactions&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;How to trace long transactions:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_stat_activity: check xact_start for transaction start time, state_change for whether transaction is still running&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;4. Subtransaction Hazards and Considerations
 &lt;div id="4-subtransaction-hazards-and-considerations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#4-subtransaction-hazards-and-considerations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Subtransaction hazards:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Excessive transaction ID consumption, premature wraparound handling. Each subtransaction consumes one XID&lt;/li&gt;
&lt;li&gt;PGPROC_MAX_CACHED_SUBXIDS overflow causing performance degradation. Each backend has a subtransaction cache of &lt;code&gt;PGPROC_MAX_CACHED_SUBXIDS&lt;/code&gt;, fixed at 64 subtransactions (hardcoded). Exceeding 64 subtransactions spills to the &lt;code&gt;pg_subtrans&lt;/code&gt; directory (reference: &lt;a href="https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful" target="_blank" rel="noreferrer"&gt;PostgreSQL Subtransactions Considered Harmful&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Using subtransactions with FOR UPDATE explicit row locks causes dramatic database performance degradation (reference: &lt;a href="https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/" target="_blank" rel="noreferrer"&gt;Notes on some PostgreSQL implementation details&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;A long transaction with a savepoint subtransaction can also cause query performance cliffs (reference: &lt;a href="https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/" target="_blank" rel="noreferrer"&gt;Why we spent the last month eliminating PostgreSQL subtransactions&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Usage recommendations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Subtransaction usage is discouraged given the above hazards&lt;/li&gt;
&lt;li&gt;If standby query workloads exist, prohibit subtransactions&lt;/li&gt;
&lt;li&gt;If subtransactions are still needed, keep them under 64 (preferably much lower)&lt;/li&gt;
&lt;li&gt;Besides explicit savepoints, subtransactions can also arise from exceptions, frameworks, and tools&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://liuzhilong.blog.csdn.net/article/details/130783474" target="_blank" rel="noreferrer"&gt;pg事务：子事务&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;5. Which Schema Changes Are Non-Online
 &lt;div id="5-which-schema-changes-are-non-online" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#5-which-schema-changes-are-non-online" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;All schema changes are non-online because all ALTER TABLE operations require an 8-level lock. However, some schema changes themselves take a long time or cause slow queries afterward. So this question can be reframed as three sub-questions:&lt;/p&gt;
&lt;p&gt;Impact on indexes? Impact on statistics? Does it require rewriting the table, causing long-held 8-level locks?&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7b272ed64104.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/cg1tXiifC83p0hWMs92Cxw" target="_blank" rel="noreferrer"&gt;Schema Change Summary Chart&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dropping a column completes immediately, but watch for composite index and multi-column statistics invalidation to avoid SQL performance avalanches&lt;/li&gt;
&lt;li&gt;Adding a column with a default value: 1) Pre-pg10 requires table rewrite 2) pg11+: only volatile function defaults require table rewrite. Also, statistics won&amp;rsquo;t be immediately available for the new column&lt;/li&gt;
&lt;li&gt;Changing column length: enlarging (except int to bigint) doesn&amp;rsquo;t rewrite the table; shrinking requires table rewrite; column statistics invalidated&lt;/li&gt;
&lt;li&gt;Changing column type: &lt;em&gt;table rewrite&lt;/em&gt;; statistics invalidated&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Adding constraints to existing columns scans the table, watch for scan duration&lt;/em&gt; (e.g., &lt;code&gt;ADD CONSTRAINT&lt;/code&gt;, &lt;code&gt;SET NOT NULL&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Adding defaults to existing columns completes immediately&lt;/em&gt; (e.g., &lt;code&gt;SET/DROP DEFAULT&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;SET { LOGGED | UNLOGGED } rewrites the table&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Storage parameter changes depend on what&amp;rsquo;s changing. E.g., fillfactor and autovacuum parameters are online, non-8-level-lock, immediate (reference: &lt;a href="https://www.postgresql.org/docs/16/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS" target="_blank" rel="noreferrer"&gt;Storage Parameters&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;6. Physical Backup Considerations (pg_start_backup)
 &lt;div id="6-physical-backup-considerations-pg_start_backup" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#6-physical-backup-considerations-pg_start_backup" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a226f3f1899f.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://postgrespro.com/media/2022/03/24/pgpro-backup-methods%20%281%29.pdf" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/media/2022/03/24/pgpro-backup-methods%20(1).pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PG physical backup:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Block-level backup, generally doesn&amp;rsquo;t support per-database backup (except pg_probackup)&lt;/li&gt;
&lt;li&gt;Exclusive mode is unnecessary because: 1) only works on primary 2) doesn&amp;rsquo;t allow parallel backup 3) created backup label may prevent primary instance recovery 4) functionally identical to non-exclusive backup. PG9.6 added non-exclusive mode; PG15 removed exclusive mode&lt;/li&gt;
&lt;li&gt;If explicitly using pg_start_backup(), must explicitly use pg_stop_backup() to end backup mode (function names differ slightly in PG15+)&lt;/li&gt;
&lt;li&gt;FPI (full page image) is force-enabled during backup, even if full_page_writes is off&lt;/li&gt;
&lt;li&gt;All tools (maybe) call pg_stop_backup() before backup starts for a checkpoint to flush dirty data, and back up all WAL from start to end, even newly generated WAL during backup, ensuring data consistency and PITR&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pg_basebackup:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Native, built-in&lt;/li&gt;
&lt;li&gt;Wraps pg_start_backup and pg_stop_backup commands&lt;/li&gt;
&lt;li&gt;PG17+ supports incremental backup and backup set merging&lt;/li&gt;
&lt;li&gt;Consumes one walsender process&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pg_probackup:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Very powerful: supports incremental backup, incremental restore, parallelism, backup set merging, backup verification, remote backup, per-database restore, etc.&lt;/li&gt;
&lt;li&gt;BUG: address space cannot exceed 4GB, fixable by modifying source code&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;pgBackRest:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Also very powerful&lt;/li&gt;
&lt;li&gt;Prerequisite: SSH must be configured from backup server to database host&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://developer.aliyun.com/article/59359" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/59359&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/app-pgbasebackup.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/app-pgbasebackup.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.enterprisedb.com/blog/exclusive-backup-mode-finally-removed-postgres-15" target="_blank" rel="noreferrer"&gt;https://www.enterprisedb.com/blog/exclusive-backup-mode-finally-removed-postgres-15&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/MasaoFujii/pg_exclusive_backup" target="_blank" rel="noreferrer"&gt;https://github.com/MasaoFujii/pg_exclusive_backup&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/postgrespro/pg_probackup" target="_blank" rel="noreferrer"&gt;https://github.com/postgrespro/pg_probackup&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pgbackrest.org/user-guide.html" target="_blank" rel="noreferrer"&gt;https://pgbackrest.org/user-guide.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;7. How Logical Backup Ensures Consistency
 &lt;div id="7-how-logical-backup-ensures-consistency" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#7-how-logical-backup-ensures-consistency" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;pg_dump completes a full backup within a single transaction, with isolation level serializable or repeatable read&lt;/li&gt;
&lt;li&gt;Before backing up data, pg_dump acquires ACCESS SHARE locks on target objects to prevent table drops&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additional logical backup considerations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Watch for lock conflicts during export&lt;/li&gt;
&lt;li&gt;If DDL operations are needed, avoid full-database or long-duration backups; split the backup into multiple tasks, e.g., one table per pg_dump invocation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://developer.aliyun.com/article/14582" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/14582&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;8. Causes of WAL Accumulation
 &lt;div id="8-causes-of-wal-accumulation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#8-causes-of-wal-accumulation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Invalid replication slots&lt;/li&gt;
&lt;li&gt;Logical replication with long transactions&lt;/li&gt;
&lt;li&gt;Excessively large wal_keep_size&lt;/li&gt;
&lt;li&gt;Excessively small archive_timeout, forcing WAL switches and archiving (equivalent to pg_switch_xlog() + archiving)&lt;/li&gt;
&lt;li&gt;Archive failures generating .ready files&lt;/li&gt;
&lt;li&gt;Single-process archiving can&amp;rsquo;t keep up&lt;/li&gt;
&lt;li&gt;FPI full page writes (check for overly frequent checkpoints, UUID-like scattered write patterns)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;9. Hazards of Long Connections
 &lt;div id="9-hazards-of-long-connections" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#9-hazards-of-long-connections" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;When PG acquires snapshot data, it must scan all backend process transaction states. Too many connections degrade performance (recommended max ~1000; pg14 optimized but still not recommended to exceed)&lt;/li&gt;
&lt;li&gt;relcache/syscache doesn&amp;rsquo;t release cached metadata, and each process caches independently, causing high memory consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;10. Role of Infomask Flags
 &lt;div id="10-role-of-infomask-flags" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#10-role-of-infomask-flags" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Infomask provides transaction, lock, and tuple status information, such as whether a transaction is committed/aborted, row lock info, HOT info, column count, etc.&lt;/li&gt;
&lt;li&gt;The header has two infomasks: &lt;code&gt;infomask&lt;/code&gt; and &lt;code&gt;infomask2&lt;/code&gt;. They store different information, with different bits representing different meanings&lt;/li&gt;
&lt;li&gt;Hint bits also write transaction info to infomask, so visibility can be determined from tuple headers alone without accessing clog&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://liuzhilong.blog.csdn.net/article/details/130782857?spm=1001.2014.3001.5502" target="_blank" rel="noreferrer"&gt;pg事务：事务相关元组结构&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;11. How NULL Values Are Stored and Whether Indexes Store NULLs
 &lt;div id="11-how-null-values-are-stored-and-whether-indexes-store-nulls" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#11-how-null-values-are-stored-and-whether-indexes-store-nulls" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;How NULL values are stored:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f3fc29d1f5cd.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;NULL is stored in the tuple header, not the tuple data area&lt;/li&gt;
&lt;li&gt;One bit in infomask marks whether the tuple contains NULLs&lt;/li&gt;
&lt;li&gt;t_bits has n*8 bits (n integer; e.g., a 10-column table has 16-bit t_bits), with a bitmap representing which columns are NULL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Whether indexes store NULL values:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;PostgreSQL indexes store NULL values; Oracle indexes don&amp;rsquo;t&lt;/li&gt;
&lt;li&gt;Storage position depends on (NULLS FIRST or NULLS LAST)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.highgo.ca/2020/10/20/the-way-to-store-null-value-in-pg-record/" target="_blank" rel="noreferrer"&gt;https://www.highgo.ca/2020/10/20/the-way-to-store-null-value-in-pg-record/&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;12. Why Full Page Writes Are Needed
 &lt;div id="12-why-full-page-writes-are-needed" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#12-why-full-page-writes-are-needed" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The official documentation&amp;rsquo;s introduction to full page writes is fairly general:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;This is needed because a page write that is in process during an operating system crash might be only partially completed, leading to an on-disk page that contains a mix of old and new data. The row-level change data normally stored in WAL will not be enough to completely restore such a page during post-crash recovery. Storing the full page image guarantees that the page can be correctly restored, but at the price of increasing the amount of data that must be written to WAL. (Because WAL replay always starts from a checkpoint, it is sufficient to do this during the first change of each page after a checkpoint)&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;OS file pages are typically 4KB, while PG pages are typically 8KB. Partial writes can occur, where a disk data page contains both old and new data, causing data loss during recovery. Hence the need for full page writes.&lt;/p&gt;
&lt;p&gt;Partial writes are closely related to disk characteristics. Detailed answers are difficult; reference &lt;a href="http://www.killdb.com/2020/04/05/double_write_partial_write_oracle_mysql_postgresql/" target="_blank" rel="noreferrer"&gt;roger&amp;rsquo;s article&lt;/a&gt;. Summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partial writes relate to whether the disk supports atomic writes&lt;/li&gt;
&lt;li&gt;Partial writes relate to whether OS block size matches database block size. Oracle/PG blocks default to 8KB, MySQL to 16KB, OS to 4KB. A database&amp;rsquo;s minimum IO requires multiple OS calls&lt;/li&gt;
&lt;li&gt;For PG, if a &lt;strong&gt;data page&lt;/strong&gt; experiences partial write, it can recover using full page images in WAL&lt;/li&gt;
&lt;li&gt;For MySQL, there&amp;rsquo;s a double write mechanism. The double write buffer is on-disk space, written sequentially before data pages to mitigate partial write&lt;/li&gt;
&lt;li&gt;For Oracle, much work has been done but no obvious solution exists. However, Oracle supports block-level recovery to replace corrupted data blocks&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Different DBs adopt different approaches to reduce partial writes. PG writes the entire data page to WAL logs, but this causes WAL write amplification. This can be mitigated through various means.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;How to perfectly solve the partial write problem?&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Atomic write-capable devices&lt;/li&gt;
&lt;li&gt;OS minimum IO matching database minimum IO&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="http://www.killdb.com/2020/04/05/double_write_partial_write_oracle_mysql_postgresql/" target="_blank" rel="noreferrer"&gt;http://www.killdb.com/2020/04/05/double_write_partial_write_oracle_mysql_postgresql/&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;13. Various Causes of Index Invalidation
 &lt;div id="13-various-causes-of-index-invalidation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#13-various-causes-of-index-invalidation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Index invalidation:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;CREATE INDEX CONCURRENTLY can leave an invalid index due to deadlock or unique index check failure; invalid indexes still get updated&lt;/li&gt;
&lt;li&gt;Invalid indexes on partitioned parent tables indicate some partitions have the index while others don&amp;rsquo;t&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Index not being used:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Inaccurate statistics&lt;/li&gt;
&lt;li&gt;Selectivity&lt;/li&gt;
&lt;li&gt;Data skew&lt;/li&gt;
&lt;li&gt;Soft parsing: first 5 times cached different execution plans&lt;/li&gt;
&lt;li&gt;Leftmost prefix principle&lt;/li&gt;
&lt;li&gt;Insufficient data (hash or full scan not slower than index)&lt;/li&gt;
&lt;li&gt;Functions (unless a matching immutable function index exists), implicit conversions, operations, LIKE with leading &amp;lsquo;%&amp;rsquo;&amp;hellip;&lt;/li&gt;
&lt;li&gt;Data type mismatch&lt;/li&gt;
&lt;li&gt;Collation mismatch (less of an issue in PG since database collation can&amp;rsquo;t change after creation; data within one database shares the same collation; cross-database access is normally impossible)&lt;/li&gt;
&lt;li&gt;SQL collation sort differing from index collation sort&lt;/li&gt;
&lt;li&gt;LIKE only usable with collation C or pattern index&lt;/li&gt;
&lt;li&gt;High correlation: index logical order vs data physical order correlation; accessing scattered data via index&lt;/li&gt;
&lt;li&gt;LIMIT xx ORDER BY column1, MIN/MAX needing TOP N scenarios where the optimizer chooses another index&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;14. Role of Commit Log
 &lt;div id="14-role-of-commit-log" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#14-role-of-commit-log" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Commit log records transaction status. During the next visibility check on a table, hint bits are triggered, writing clog transaction status to the tuple header.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why not write transaction status to the tuple header immediately?&lt;/strong&gt; Hint bits immediate update performs very poorly, so transaction status is first placed in clog, reducing PGXACT contention and improving performance.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782857" target="_blank" rel="noreferrer"&gt;pg事务：事务相关元组结构&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;15. Database Join Methods and Their Applicable Scenarios
 &lt;div id="15-database-join-methods-and-their-applicable-scenarios" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#15-database-join-methods-and-their-applicable-scenarios" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;1.1 Nested Loop Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/20abc423c1e9.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1,t3 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; lzl1.col1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;t3.a::text;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; Filter: ((lzl1.col1)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (t3.a)::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The driving table (outer in the diagram, first table in the plan) matches each row against every row of the driven table (inner, second table in the plan). The driving table is scanned once; the driven table is scanned N times (N = driving table rows).&lt;/p&gt;
&lt;p&gt;NL suits almost all scenarios; it&amp;rsquo;s the simplest brute-force join. Generally smaller tables serve as the driving table (actually neither table should be too large, unless other join types don&amp;rsquo;t apply).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1.2 Materialized Nested Loop Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2b45752abb3b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_a &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; a, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;750230&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; Filter: (a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;145&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Materialize (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If the driven table (inner) needs multiple scans, physical IO each time would be very slow (and seems silly). Materialize scans the driven table into memory (work_mem), performing only one physical table scan, allowing the driven table to be accessed multiple times in memory.&lt;/p&gt;
&lt;p&gt;This scenario is very common in real-world workloads.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1.3 Indexed Nested Loop Join (inner indexed)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/661dba35e09a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_c &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1935&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tbl_c_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_c &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;36&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;1.4 NL Variants&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d21a425177c0.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;All are essentially NL; the main variations are whether indexes are used on either table and whether Materialize is applied.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2.1 Merge Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9914756afd16.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_a &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; a, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; b.id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;944&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;984&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;71&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge Cond: (a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;809&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;834&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: a.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;145&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;135&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;137&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: b.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In merge join, both the driving and driven tables must be sorted first (both tables have Sort in the plan) before matching. Advantage: fewer table scans and matches than NL. Disadvantage: sorting required.&lt;/p&gt;
&lt;p&gt;Since indexes are sorted, and SQL may include DISTINCT, GROUP BY, SORT, MAX/MIN etc. requiring ordering, merge join is also common.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2.2 Materialized Merge Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fb637c9d769e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;testdb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXPLAIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; tbl_a &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; a, tbl_b &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10466&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10578&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;58&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2064&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge Cond: (a.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6708&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6733&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;39&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: a.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1529&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Materialize (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3757&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3782&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3757&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3770&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: b.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1193&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1032&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Materialize doesn&amp;rsquo;t reduce table scans (both tables scanned once), but the sort operation can happen in the backend&amp;rsquo;s work_mem for better efficiency; if exceeding work_mem, disk sort is used.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2.3 Merge Join Variants&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7cfae9b6cfff.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Similar to NL variants, mainly Materialize and index usage. When using indexes, since the index is inherently ordered, no extra sorting is needed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;135&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;61&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;322&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Merge Cond: (&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; b.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; tbl_c_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_c &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;318&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;135&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;33&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;137&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: b.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tbl_b b (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (id &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So indexes and Materialize are very common in merge joins.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3.1 Hash Join&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/53c4660e122d.png" alt="Insert image description here" /&gt;


&lt;img src="https://lastdba.com/img/csdn/fb09a2a553f8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Hash join consists of build and probe phases.&lt;/p&gt;
&lt;p&gt;The build phase places the driving table (inner in the diagram, second row in the plan!) into work_mem; the probe phase compares hash values.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hash join only possible with &amp;lsquo;=&amp;rsquo; conditions&lt;/li&gt;
&lt;li&gt;Hash join consumes memory; generally both tables aren&amp;rsquo;t very large&lt;/li&gt;
&lt;li&gt;Note: the driving table (hash build table) is the second row in the plan, opposite of NL&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;3.2 Hybrid Hash Join with Skew&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Not fully understood; appears to support spilling to disk. To be revisited.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/pgsql03/05/01.html" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/pgsql03/05/01.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;16. Applicable Scenarios for Various Index Types (HASH/GIN/BTREE/GIST/BLOOM/BRIN)
 &lt;div id="16-applicable-scenarios-for-various-index-types-hashginbtreegistbloombrin" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#16-applicable-scenarios-for-various-index-types-hashginbtreegistbloombrin" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;(1) BTREE&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f490c66d7714.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://en.wikibooks.org/wiki/PostgreSQL/Index_Btree" target="_blank" rel="noreferrer"&gt;https://en.wikibooks.org/wiki/PostgreSQL/Index_Btree&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Possible usage patterns:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;		 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;foo%&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;A meta node points to the root node&lt;/li&gt;
&lt;li&gt;Leaf node access complexity O(logN), N being row count&lt;/li&gt;
&lt;li&gt;Inherently sorted, easily used by ORDER BY, MIN/MAX, GROUP BY, merge joins, etc.&lt;/li&gt;
&lt;li&gt;Default index type, most common. Structure is similar across databases with leaf node structure differences (MySQL secondary index leaf nodes store index key + primary key, then access clustered index via primary key; Oracle index leaf nodes store index key + rowid; PG index leaf nodes store index key + tid)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;(2) HASH&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a7e8e0b28860.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://leopard.in.ua/2015/04/13/postgresql-indexes）&lt;/p&gt;
&lt;p&gt;Index data is converted to 32-bit hash values stored in corresponding hash buckets; different hash values point to their respective data rows.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Complexity O(1)&lt;/li&gt;
&lt;li&gt;Hash indexes can &lt;strong&gt;only&lt;/strong&gt; be used for &lt;code&gt;=&lt;/code&gt; conditions&lt;/li&gt;
&lt;li&gt;When key values are large, they&amp;rsquo;re generally smaller than BTREE indexes and don&amp;rsquo;t need character-by-character comparison like BTREE, offering better efficiency. So hash indexes suit scenarios with large key values&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;(3) GIST&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;GIST (Generalized Search Tree) is similar to BTREE, also a balanced tree. GIST isn&amp;rsquo;t actually one index type but a framework containing many index strategies: R-TREE, RD-TREE. Unlike BTREE using &lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt; etc. for numeric/character data, GIST excels at geographic, text, image, and similar data. Geographic operators include: &lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; distance calculation, &lt;code&gt;&amp;lt;&amp;lt;&lt;/code&gt; left-of check, &lt;code&gt;@&amp;gt;&lt;/code&gt; contains check, etc.&lt;/p&gt;
&lt;p&gt;GIST excels at:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GIS data processing (similar data processing also possible, e.g., &lt;a href="https://pic.huodongjia.com/ganhuodocs/2017-07-15/1500104265.79.pdf" target="_blank" rel="noreferrer"&gt;digoal-GIST index for IP range query optimization&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Nearest-neighbor algorithms (pg_vector and similar vector data; to be researched)&lt;/li&gt;
&lt;li&gt;Full-text search (seems to need contrib/intarray)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;RTREE:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f63754993caa.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/R-tree）&lt;/p&gt;
&lt;p&gt;The most common index for GIS data is RTREE. Two-dimensional spatial data consists of coordinates; scanning coordinates one by one to find locations is slow. BTREE isn&amp;rsquo;t suitable for such data, so RTREE emerged. RTREE&amp;rsquo;s core concept is grouping nearby points using rectangles at different hierarchy levels; finer grouping yields more precise positioning.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4175817" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4175817&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(4) SP-GIST:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Space-Partitioned GIST is similar to GIST, also an index creation framework. SP-GIST suits structures that partition space into non-overlapping regions (unlike RTREE which overlaps), such as quadtrees, k-d trees, and radix trees.&lt;/p&gt;
&lt;p&gt;Quadtrees:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2103e76b673a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/Quadtree）&lt;/p&gt;
&lt;p&gt;Q-TREE comes in square, rectangular, and various shapes. The most &amp;ldquo;orthodox&amp;rdquo; Q-TREE as shown above generally has these properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Each internal node has four children&lt;/li&gt;
&lt;li&gt;Index follows depth structure to locate data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;K-d trees:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/656f08fc9ac3.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/05d79891bd23.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/K-d_tree）&lt;/p&gt;
&lt;p&gt;K-dimensional trees manage multi-dimensional points using multi-dimensional space concepts; each non-leaf node is split in two. For example, the 3D space diagram above is a 3-dimensional k-d tree model: first split (red) divides the entire space in half; second split (green) divides subspaces in half&amp;hellip; until no further division is possible. The second diagram shows the tree structure of a 3D k-d tree (don&amp;rsquo;t mistake it for BTREE!); this tree has only 3 dimensions: Name, Age, Salary.&lt;/p&gt;
&lt;p&gt;Radix-tree:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/157f00ff6b48.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/Radix_tree）&lt;/p&gt;
&lt;p&gt;Radix: each child synthesizes its parent. Key lookup complexity is O(path length); if common prefixes exist, complexity is higher.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4220639" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4220639&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(5) GIN&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;BTREE and GIST have very low query efficiency when there are very many key-value entries. GIN (Generalized Inverted Index) excels at such scenarios: array, full text, and JSON retrieval operations. Both GIST and GIN are generalized/framework-based, supporting multiple data index types; both also support full-text indexing. GIN only supports Bitmap scans.&lt;/p&gt;
&lt;p&gt;PostgreSQL natively supports many operators, some of which are GIN-related data type operators:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/16/functions-array.html" target="_blank" rel="noreferrer"&gt;Array operators&lt;/a&gt;, e.g., &lt;code&gt;@&amp;gt;&lt;/code&gt; whether array1 contains array2; &lt;code&gt;unnest&lt;/code&gt; expand array&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.postgresql.org/docs/16/functions-textsearch.html" target="_blank" rel="noreferrer"&gt;Full-text search operators&lt;/a&gt;, e.g., &lt;code&gt;@@&lt;/code&gt; whether tsvector matches tsquery&lt;/li&gt;
&lt;li&gt;Also some &lt;a href="https://www.postgresql.org/docs/16/functions-json.html" target="_blank" rel="noreferrer"&gt;JSON operators&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PG supports &lt;a href="https://www.postgresql.org/docs/16/datatype-textsearch.html" target="_blank" rel="noreferrer"&gt;two data types for full-text search&lt;/a&gt;: tsvector and tsquery&lt;/p&gt;
&lt;p&gt;&lt;em&gt;1. tsvector&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;tsvector tokenizes text with &lt;strong&gt;deduplication and sorting&lt;/strong&gt;, using tsvector_ops operators. Example tokenization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;::tsvector;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tsvector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;Fat&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;Rat&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;The&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;is&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;::tsvector tokenization is generally not the final form; to_tsvector normalizes tokens (final form), showing token positions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;english&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; to_tsvector 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;fat&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;rat&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note &amp;rsquo;the&amp;rsquo;, &amp;lsquo;is&amp;rsquo;, &amp;lsquo;a&amp;rsquo;, and case are all removed — this is to_tsvector&amp;rsquo;s rule, matching real-world scenarios since full-text search typically targets words.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;2. tsquery&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;Normally you can search tokenized text by word:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;rat&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;To search for &amp;ldquo;contains both fat and rat&amp;rdquo;, simple word input won&amp;rsquo;t work — tsquery operates on &lt;em&gt;the tokens being searched&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;tsquery can be composed with &lt;code&gt;&amp;amp;&lt;/code&gt; (AND), &lt;code&gt;|&lt;/code&gt; (OR), &lt;code&gt;!&lt;/code&gt; (NOT), &lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; (FOLLOWED BY). Examples:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; to_tsquery( &lt;span style="color:#e6db74"&gt;&amp;#39;fat&amp;amp;rat&amp;#39;&lt;/span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; to_tsquery( &lt;span style="color:#e6db74"&gt;&amp;#39;fat&amp;amp;rat&amp;amp;cat&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; to_tsvector(&lt;span style="color:#e6db74"&gt;&amp;#39;The Fat Rat is a Rat&amp;#39;&lt;/span&gt;) &lt;span style="color:#f92672"&gt;@@&lt;/span&gt; to_tsquery( &lt;span style="color:#e6db74"&gt;&amp;#39;rat&amp;lt;-&amp;gt;fat&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;?&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;column&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Fulltext GIN:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Full-text GIN indexes first tokenize the indexed field (to_tsvector). Example: doc_tsv below is the tokenized state of &lt;code&gt;left&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; doc_tsv 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------+----------------------+---------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Can a sheet slitter &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; How many sheets coul &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;could&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;mani&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I slit a sheet, a sh &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Upon a slitted sheet &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;upon&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Whoever slit the she &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;good&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;whoever&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I am a sheet slitter &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I slit sheets. &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; I am the sleekest sh &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;ever&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sleekest&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slitter&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; She slits the sheet &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sheet&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;sit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;slit&amp;#39;&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then indexing by tokens and their ctids:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/815fee8ad284.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://postgrespro.com/blog/pgsql/4261647" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4261647&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;The index is sorted by token order, similar to BTREE; leaf nodes store ctids pointed to by tokens. Since the same token can come from multiple tuples, a token can point to multiple ctids. When multiple ctids exist, a posting tree is built — essentially a BTREE of ctids within.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fulltext GIN addressing:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;for &amp;ldquo;mani&amp;rdquo; — (0,2).
for &amp;ldquo;slitter&amp;rdquo; — (0,1), (0,2), (1,2), (1,3), (2,2).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/49ae172a7923.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GIN updates:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Updating (insert/update/delete) a text generally requires updating many places in the GIN index because:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;One text can have many tokens scattered across GIN index branches&lt;/li&gt;
&lt;li&gt;One token may contain multiple ctids since many texts share that token&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This makes GIN updates very expensive. Batch updates are typically better than row-by-row updates since some tokens are shared, reducing update work.&lt;/p&gt;
&lt;p&gt;Besides batch updates, GIN provides fast update functionality (fastupdate = true):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/891a4e0ed575.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf）&lt;/p&gt;
&lt;p&gt;GIN fast update:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Incrementally updated data goes to a separate, unsorted area&lt;/li&gt;
&lt;li&gt;When vacuum runs or the list reaches &lt;code&gt;gin_pending_list_limit&lt;/code&gt;, incremental updates are written back to the main GIN index&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;GiST or GIN?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Both GiST and GIN are generalized index frameworks supporting full-text indexing, but their full-text index structures are completely different. GIST suits geographic and multi-dimensional spatial data; GIN mainly indexes scenarios where a key contains multiple values, such as arrays, full text, JSON.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;GIN indexes are faster than GiST; generally, full-text indexing can blindly choose GIN (reference: &lt;a href="https://leopard.in.ua/2015/04/13/postgresql-indexes" target="_blank" rel="noreferrer"&gt;GIST vs GIN&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Only with very frequent updates should GiST be considered, assuming fast update strategy can&amp;rsquo;t solve the update problem (e.g., configuring nightly write-back strategy). Better to compare GiST and GIN for various full-text indexing scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/datatype-textsearch.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/datatype-textsearch.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4261647" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4261647&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(6) BRIN&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0ee340aa3ea8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://postgrespro.com/blog/pgsql/5967830）&lt;/p&gt;
&lt;p&gt;BRIN is not a tree-type index. Data is grouped in multiple pages (or blocks) as one range (similar to range partition but not physically partitioned). The table is divided into ranges, hence the name Block Range Index (BRIN).&lt;/p&gt;
&lt;p&gt;The most critical BRIN component is the revmap layer, which stores only key value ranges and ctids, &lt;strong&gt;not the key values themselves&lt;/strong&gt;. This is why BRIN indexes are very small — storing key values would make it like a branch-less BTREE.&lt;/p&gt;
&lt;p&gt;Since only key value ranges and ctids are stored, data lookup requires accessing all data pages pointed to by matching revmap pages, then rechecking for final data rows.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;															QUERY PLAN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; flights_bi (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;151&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;192&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;210&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;587353&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: (airport_utc_offset &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;08:00:00&amp;#39;&lt;/span&gt;::interval)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;191318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Heap Blocks: lossy&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13380&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; flights_bi_airport_utc_offset_idx
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;999&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;999&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;133800&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (airport_utc_offset &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;08:00:00&amp;#39;&lt;/span&gt;::interval)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Whether index key order matches storage order is critical. For example, non-sequentially stored extra key value data may be on &amp;ldquo;distant&amp;rdquo; pages, requiring extra IO to access distant data pages. Worst case, it may scan the entire table:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/46ee8f7372ff.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf）&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BRIN suitable scenarios:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;BRIN indexes only suit data where index key order is highly consistent with storage order. Check the column&amp;rsquo;s correlation in pg_stats — should approach 1 (maybe -1 also works?), typically auto-increment primary keys and timestamp columns&lt;/li&gt;
&lt;li&gt;Nearly no update scenarios. Updates may reduce correlation&lt;/li&gt;
&lt;li&gt;BRIN indexes generally suit very large data, especially TB-scale and beyond&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5967830" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5967830&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(7) RUM&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;RUM is an extension, not natively included in PG. RUM and GIN indexes are similar except RUM additionally stores tsvector position information.&lt;/p&gt;
&lt;p&gt;Although GIN requires to_tsvector() (or direct tsvector) for tokenization, GIN doesn&amp;rsquo;t use the position information from to_tsvector(). For example, finding the distance between two tokens can&amp;rsquo;t be done with GIN — only via raw to_tsvector() data. RUM handles this.&lt;/p&gt;
&lt;p&gt;RUM indexes attach token position information alongside ctids, compared to GIN:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9c5cdfb1d385.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://postgrespro.com/blog/pgsql/4262305）&lt;/p&gt;
&lt;p&gt;RUM, similar to GIN, suits full-text indexing, with additional capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Distance operators (e.g., &amp;lt;=&amp;gt;) for distance calculation&lt;/li&gt;
&lt;li&gt;Position-based sorting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/4262305" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/4262305&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(8) BLOOM&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A Bloom filter quickly determines whether an element is in a set. Bloom filters can have false positives — &amp;ldquo;in set&amp;rdquo; isn&amp;rsquo;t guaranteed true, but &amp;ldquo;not in set&amp;rdquo; is guaranteed true. BLOOM indexes are also non-tree, flat structures (requiring recheck like BRIN).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bf06b10cd015.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://en.wikipedia.org/wiki/Bloom_filter）&lt;/p&gt;
&lt;p&gt;Bloom indexes can index many columns. Similar to hash indexes, but unlike hash indexes, they can specify hashed fields and combine them, with total length limited by the &lt;code&gt;length&lt;/code&gt; parameter. Because of the segmented hashing and truncation, false positives exist. Shorter length means higher false positive probability (max length 4096 bits).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; ... &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; bloom(...) &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;length&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;..., col1&lt;span style="color:#f92672"&gt;=&lt;/span&gt;..., col2&lt;span style="color:#f92672"&gt;=&lt;/span&gt;..., ...);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/93a3ccefbd2d.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://postgrespro.com/blog/pgsql/5967832）&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/current/bloom.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/bloom.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5967832" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5967832&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Index Type&lt;/th&gt;
 &lt;th&gt;Structure&lt;/th&gt;
 &lt;th&gt;Operators&lt;/th&gt;
 &lt;th&gt;Access Complexity&lt;/th&gt;
 &lt;th&gt;Native?&lt;/th&gt;
 &lt;th&gt;Ordered?&lt;/th&gt;
 &lt;th&gt;Accurate?&lt;/th&gt;
 &lt;th&gt;Applicable Scenarios&lt;/th&gt;
 &lt;th&gt;Advantages&lt;/th&gt;
 &lt;th&gt;Disadvantages&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;btree&lt;/td&gt;
 &lt;td&gt;btree; branch stores key ranges, leaf nodes store keys and ctids, generally ascending&lt;/td&gt;
 &lt;td&gt;&amp;gt;=, =, IS NULL etc. common operators; leftmost prefix rule&lt;/td&gt;
 &lt;td&gt;O(logN)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;High selectivity scenarios; not suitable for too-large data&lt;/td&gt;
 &lt;td&gt;Fits most scenarios; no extra sorting needed&lt;/td&gt;
 &lt;td&gt;Large key values make index very large; index fragmentation/splitting (HOT mitigates)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;hash&lt;/td&gt;
 &lt;td&gt;Builds hash buckets; different hash values point to different rows&lt;/td&gt;
 &lt;td&gt;Only =&lt;/td&gt;
 &lt;td&gt;O(1)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Only = condition scenarios; large key values&lt;/td&gt;
 &lt;td&gt;Generally small; fast access&lt;/td&gt;
 &lt;td&gt;Very narrow use case&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GiST&lt;/td&gt;
 &lt;td&gt;Index framework; R-TREE, RD-TREE; groups addresses at different layers for precision&lt;/td&gt;
 &lt;td&gt;Spatial operators: &lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; distance, &lt;code&gt;&amp;lt;&amp;lt;&lt;/code&gt; left-of, &lt;code&gt;@&amp;gt;&lt;/code&gt; contains etc.&lt;/td&gt;
 &lt;td&gt;Layer height&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes (supports KNN)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;GIS; KNN; frequently updated full-text index&lt;/td&gt;
 &lt;td&gt;GIS, multi-dimensional data&lt;/td&gt;
 &lt;td&gt;Special-case scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sp-GiST/Q-tree&lt;/td&gt;
 &lt;td&gt;(sp-GiST is framework; index excludes overlapping data) Q-tree: each node has 4 internal nodes&lt;/td&gt;
 &lt;td&gt;Spatial operators: up/down/left/right, equality, contains&lt;/td&gt;
 &lt;td&gt;Layer height&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;GIS&lt;/td&gt;
 &lt;td&gt;GIS&lt;/td&gt;
 &lt;td&gt;GIS&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sp-GiST/k-d tree&lt;/td&gt;
 &lt;td&gt;k-d tree: splits multi-dimensional space at nodes until no further split&lt;/td&gt;
 &lt;td&gt;Spatial operators&lt;/td&gt;
 &lt;td&gt;Min O(k), avg O(logN), max O(N/2)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;GIS; multi-dimensional data&lt;/td&gt;
 &lt;td&gt;GIS, multi-dimensional data&lt;/td&gt;
 &lt;td&gt;Special-case scenarios&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;sp-GiST/radix-tree&lt;/td&gt;
 &lt;td&gt;radix-tree: each child synthesizes its parent&lt;/td&gt;
 &lt;td&gt;Common operators: =, &amp;gt;, ~ etc.&lt;/td&gt;
 &lt;td&gt;Min O(1), max O(N)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Scenarios without common data&lt;/td&gt;
 &lt;td&gt;Supports common operators beyond GIST&lt;/td&gt;
 &lt;td&gt;Limited scenarios; can be very slow&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GIN&lt;/td&gt;
 &lt;td&gt;Index framework; similar to btree: branch stores token ranges, leaf stores tokens and ctids; one token pointing to multiple ctids may have sub-posting-tree; fast update enabled adds linked-list space for incremental data&lt;/td&gt;
 &lt;td&gt;Operators vary slightly by data type; generally @@ contains&lt;/td&gt;
 &lt;td&gt;Related to text length/token repetition; approx O(logN)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No (branches ordered but no token position info)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Key-contains-multiple-values scenarios: array, full text, JSON, many columns&lt;/td&gt;
 &lt;td&gt;Best choice for multi-value key scenarios&lt;/td&gt;
 &lt;td&gt;Updates need proper strategy&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BRIN&lt;/td&gt;
 &lt;td&gt;Non-tree: groups data pages by range; rev index layer stores only key ranges and ctids&lt;/td&gt;
 &lt;td&gt;Common operators: &amp;lt; &amp;lt;= = &amp;gt;= &amp;gt;&lt;/td&gt;
 &lt;td&gt;Page lookup O(1); data return O(N), N=recheck rows&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Not strictly ordered, only suits ordered data&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Sequential storage (time-series, auto-increment); very large tables; nearly no updates; range queries&lt;/td&gt;
 &lt;td&gt;Very small index&lt;/td&gt;
 &lt;td&gt;Extremely demanding on correlation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;RUM&lt;/td&gt;
 &lt;td&gt;Similar to GIN, but additionally stores token position info&lt;/td&gt;
 &lt;td&gt;Includes GIN operators plus position operators&lt;/td&gt;
 &lt;td&gt;Related to text length/token repetition; approx O(logN)&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Yes (supports KNN lookup)&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Key-contains-multiple-values scenarios; suitable for KNN&lt;/td&gt;
 &lt;td&gt;Stores position info beyond GIN&lt;/td&gt;
 &lt;td&gt;Requires extension installation&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;BLOOM&lt;/td&gt;
 &lt;td&gt;Each field hashed and truncated; non-tree, bitmap filtering&lt;/td&gt;
 &lt;td&gt;Common operators: &amp;lt; &amp;lt;= = &amp;gt;= &amp;gt;&lt;/td&gt;
 &lt;td&gt;Miss: O(1); hit: O(N), N=recheck rows&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;Suitable for miss scenarios&lt;/td&gt;
 &lt;td&gt;Can be very fast&lt;/td&gt;
 &lt;td&gt;Can be very slow on recheck&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Additional index section references:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://it.badykov.com/blog/2020/03/21/postgresql-indexes/" target="_blank" rel="noreferrer"&gt;Types of PostgreSQL Indexes. Short and clear&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://leopard.in.ua/2015/04/13/postgresql-indexes" target="_blank" rel="noreferrer"&gt;https://leopard.in.ua/2015/04/13/postgresql-indexes&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pic.huodongjia.com/ganhuodocs/2017-07-15/1500104265.79.pdf" target="_blank" rel="noreferrer"&gt;https://pic.huodongjia.com/ganhuodocs/2017-07-15/1500104265.79.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://developer.aliyun.com/article/698090?spm=a2c6h.12873639.article-detail.43.702e7149IBMYL9" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/698090?spm=a2c6h.12873639.article-detail.43.702e7149IBMYL9&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgresql.us/events/pgopen2019/sessions/session/647/slides/45/look-it-up.pdf" target="_blank" rel="noreferrer"&gt;https://postgresql.us/events/pgopen2019/sessions/session/647/slides/45/look-it-up.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf" target="_blank" rel="noreferrer"&gt;https://www.pgcon.org/2016/schedule/attachments/434_Index-internals-PGCon2016.pdf&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;17. How Row Locks Are Implemented, Whether Stored in Shared Memory
 &lt;div id="17-how-row-locks-are-implemented-whether-stored-in-shared-memory" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#17-how-row-locks-are-implemented-whether-stored-in-shared-memory" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Row locks in PG are in the row header, not implemented in memory.&lt;/p&gt;
&lt;p&gt;(1) After t1 updates without committing, it acquires exclusive locks on relation and transactionid:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/16040258a95a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(2) t2 updating the same row gets blocked; this blocking is implemented via transactionid sharelock. t2 acquires both relation and tuple locks:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2cca36c19235.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d3b2e8a88a88.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(3) t3 updating this row gets blocked via tuple exclusive lock:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9f0527a73a0a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b5ea7c9fe3f1.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;In summary, &lt;strong&gt;PG row locks are implemented jointly via transactionid locks, relation locks, and tuple locks:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5160903bb82b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;《postgresql-internals-14》&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5968005" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5968005&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;18. Differences Between Streaming Replication and Logical Replication, and Their Applicable Scenarios
 &lt;div id="18-differences-between-streaming-replication-and-logical-replication-and-their-applicable-scenarios" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#18-differences-between-streaming-replication-and-logical-replication-and-their-applicable-scenarios" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Streaming replication here generally refers to PG physical replication, synchronizing full WAL logs downstream for replay by the downstream PG instance at the physical block level:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d8149234af0e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Logical replication requires logically decoding transaction information from WAL for relevant tables, ordering transactions via reorder buffer, then outputting data in the form determined by the output plugin. The downstream need not be a PG instance. Must have replication slots managing logical decoding, output plugin, reorder buffer, replication positions, etc., plus knowledge of replica identity, slot/sender status, and more:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2fde90f69b14.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Logical replication has many issues but is increasingly widely used and is a key focus area for PG community updates.&lt;/p&gt;
&lt;p&gt;For example (incomplete list):&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;logical_decoding_work_mem is no longer hardcoded 4096 (changes); it&amp;rsquo;s now a configurable GUC parameter. Decoding spill issues are somewhat mitigated&lt;/li&gt;
&lt;li&gt;PG14+ supports streaming logical replication: uncommitted transactions can transmit data downstream; subsequent commit info determines whether to apply the changes&lt;/li&gt;
&lt;li&gt;Standby servers support replication slots; logical replication can be established on standbys&lt;/li&gt;
&lt;li&gt;Failover slots (in progress?)&lt;/li&gt;
&lt;li&gt;Many more updates&amp;hellip;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/120000817" target="_blank" rel="noreferrer"&gt;PG流复制详解&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;pg内功修炼：逻辑复制&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;19. What Is Streaming Replication Conflict and Why It Occurs
 &lt;div id="19-what-is-streaming-replication-conflict-and-why-it-occurs" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#19-what-is-streaming-replication-conflict-and-why-it-occurs" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Cause of conflict:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The standby is running a query on a table (from application or manual connection). Meanwhile, the primary executes DROP TABLE, written to WAL and transmitted to the standby for replay. To ensure data consistency, PostgreSQL must rapidly replay WAL. The DROP TABLE and SELECT then conflict. Since the primary doesn&amp;rsquo;t know the standby&amp;rsquo;s transaction state, and the standby must stay consistent with the primary, &amp;ldquo;query conflict&amp;rdquo; occurs.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Conflict scenarios:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Primary exclusive locks (including explicit LOCK commands and various DDL)&lt;/li&gt;
&lt;li&gt;Primary vacuum cleaning dead tuples — if the standby is using those tuples, conflict arises&lt;/li&gt;
&lt;li&gt;Primary drops a tablespace that the standby query is using&lt;/li&gt;
&lt;li&gt;Primary drops a database that the standby is using&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Mitigating query conflicts (can&amp;rsquo;t fully resolve):&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;hot_standby_feedback&lt;/code&gt;: standby periodically notifies the primary of the minimum active transaction ID (xmin), preventing the primary vacuum from cleaning tuples older than the xmin value.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_standby_streaming_delay&lt;/code&gt;: standby queries aren&amp;rsquo;t immediately canceled; instead wait for a period before throwing an error if not finished.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;max_standby_archive_delay&lt;/code&gt;: waiting time before canceling standby queries due to conflicts from processing archived WAL logs.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vacuum_defer_cleanup_age&lt;/code&gt;: specifies how many transactions vacuum delays dead tuple cleanup by; i.e., vacuum and vacuum full won&amp;rsquo;t immediately clean just-deleted tuples.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/120000817" target="_blank" rel="noreferrer"&gt;PG流复制详解&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;20. PostgreSQL Permission System Overview
 &lt;div id="20-postgresql-permission-system-overview" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#20-postgresql-permission-system-overview" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b45d38154897.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Hard to summarize comprehensively; it&amp;rsquo;s somewhat complex. Key points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Permission access requires each layer to be &amp;ldquo;open&amp;rdquo;; none can be missing&lt;/li&gt;
&lt;li&gt;Best to separate read-only/read-write/owner users&lt;/li&gt;
&lt;li&gt;Read-only and read-write permissions can be managed via roles&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/jQP36rXZb4sgA71AaIJ-Sw" target="_blank" rel="noreferrer"&gt;PostgreSQL学徒:又被权限搞晕了？拿捏！&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;21. Common High Availability Solutions, Selection Criteria, Pros and Cons
 &lt;div id="21-common-high-availability-solutions-selection-criteria-pros-and-cons" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#21-common-high-availability-solutions-selection-criteria-pros-and-cons" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;HA selection considerations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sync mode choice, availability zones, cross-region multi-active&lt;/li&gt;
&lt;li&gt;Switchover, failover&lt;/li&gt;
&lt;li&gt;Load balancing, read/write separation&lt;/li&gt;
&lt;li&gt;Host, database, and application-level HA&lt;/li&gt;
&lt;li&gt;VIP switching, connection string HA, connection switching&lt;/li&gt;
&lt;li&gt;Solving single point of failure or split-brain; election mechanisms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Below are some known architectures:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pgpool-II+watchdog&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/748bd7fc3712.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.pgpool.net/docs/latest/en/html/example-cluster.html）&lt;/p&gt;
&lt;p&gt;Pros: automatic failover, read/write separation, load balancing, watchdog election
Cons: complex configuration, pgpool doesn&amp;rsquo;t fully support all PG features, pgpool performance overhead, depends on watchdog election&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;patroni+etcd&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ed1ce367a7b8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Pros: GUI (patroni), automatic failover, majority election
Cons: learning curve, doesn&amp;rsquo;t support other databases (patroni)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;patroni+pgbouncer+haproxy+etcd&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e7604c9266a6.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.percona.com/sites/default/files/eBook-PostgreSQL-High-Availability.pdf）&lt;/p&gt;
&lt;p&gt;Pros: open-source stack: haproxy for load balancing, pgbouncer for connection pooling, patroni for cluster management, etcd for election
Cons: very complex configuration&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ping An Financial Cloud rasesql architecture&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/78b4331a5822.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.ocftcloud.com/ssr/help/database/RASESQL/intro.Architecture）&lt;/p&gt;
&lt;p&gt;Pros: failover support, simple architecture
Cons: same-city remote can&amp;rsquo;t directly read-only access, higher resource usage, no election (?)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Alibaba Cloud Polar-X&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b3c6ace6a20f.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3578c8002447.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（&lt;a href="https://ucc-private-download.oss-cn-beijing.aliyuncs.com/ab3f233b4a4c405986b2a8196cb53b47.pdf?Expires=1708410598&amp;amp;OSSAccessKeyId=LTAIvsP3ECkg4Nm9&amp;amp;Signature=O9UIudjtFyMmQW4eZf2BlClhVDk%3D" target="_blank" rel="noreferrer"&gt;PolarDB for PostgreSQL 三节点功能介绍&lt;/a&gt;）&lt;/p&gt;
&lt;p&gt;Pros: read/write separation, can add non-voting nodes, failover, logger nodes participate in election/data flow/backup
Cons: &amp;hellip;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Google Cloud PG&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Three architecture options:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b85525616eeb.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Google Cloud Native Architecture (MIG):&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0cc376b9b922.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Pros: three options to choose from, well-documented! (the other two derive from open-source architectures with similar pros/cons; MIG cloud-native approach described below)
MIG advantages: doesn&amp;rsquo;t depend on PG native HA; uses Regional persistent disk for data HA. Primary zone network isolation; disk can be attached to zone B in the same region (within 1 minute).
MIG disadvantages: no read replicas; only within-region failover (no multi-region deployment)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Aurora for PG&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6c3c996ceeb5.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Pros: simple architecture, recovered primary node auto-joins cluster, multi-region deployment, standby readable
Cons: (seemingly) no election mechanism; docs heavy on text, light on diagrams&lt;/p&gt;
&lt;p&gt;崔健：PostgreSQL的高可以架构设计与实践&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.pgpool.net/docs/latest/en/html/example-cluster.html" target="_blank" rel="noreferrer"&gt;https://www.pgpool.net/docs/latest/en/html/example-cluster.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.postgres.cn/downfiles/pgconf_2018/PostgresChina2018_%E6%B1%AA%E6%B4%8B_PG%E4%B9%8B%E9%AB%98%E5%8F%AF%E7%94%A8%E7%89%B9%E6%80%A7%E3%80%81%E5%B7%A5%E5%85%B7%E5%8F%8A%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1.pdf" target="_blank" rel="noreferrer"&gt;汪总： Postgresql 高可用&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.tencent.com/developer/article/1185379" target="_blank" rel="noreferrer"&gt;使用Patroni和HAProxy创建高度可用的PostgreSQL集群&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.percona.com/sites/default/files/eBook-PostgreSQL-High-Availability.pdf" target="_blank" rel="noreferrer"&gt;https://www.percona.com/sites/default/files/eBook-PostgreSQL-High-Availability.pdf&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://ucc-private-download.oss-cn-beijing.aliyuncs.com/ab3f233b4a4c405986b2a8196cb53b47.pdf?Expires=1708410598&amp;amp;OSSAccessKeyId=LTAIvsP3ECkg4Nm9&amp;amp;Signature=O9UIudjtFyMmQW4eZf2BlClhVDk%3D" target="_blank" rel="noreferrer"&gt;PolarDB for PostgreSQL 三节点功能介绍&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/architecture/architectures-high-availability-postgresql-clusters-compute-engine" target="_blank" rel="noreferrer"&gt;https://cloud.google.com/architecture/architectures-high-availability-postgresql-clusters-compute-engine&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html" target="_blank" rel="noreferrer"&gt;https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;22. Five Levels of synchronous_commit; Why Standby Queries Can&amp;rsquo;t Immediately See Primary Inserts
 &lt;div id="22-five-levels-of-synchronous_commit-why-standby-queries-cant-immediately-see-primary-inserts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#22-five-levels-of-synchronous_commit-why-standby-queries-cant-immediately-see-primary-inserts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f2ec64d6d8a4.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/120000817" target="_blank" rel="noreferrer"&gt;PG流复制详解&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;23. Transaction ID Wraparound Causes and Maintenance Optimization
 &lt;div id="23-transaction-id-wraparound-causes-and-maintenance-optimization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#23-transaction-id-wraparound-causes-and-maintenance-optimization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Why transaction ID wraparound exists:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Every non-query transaction consumes a transaction ID. Query transactions consume virtual transaction IDs (VXID), which are locally counted. Though VXID has wraparound issues, session restart resets VXID counting, so it&amp;rsquo;s rarely problematic.&lt;/p&gt;
&lt;p&gt;However, transaction IDs have an upper limit. &lt;code&gt;TransactionId&lt;/code&gt; is a 32-bit unsigned integer, storing &lt;code&gt;2^32=4294967296&lt;/code&gt; — about 4.2 billion transactions. At this point, transaction IDs must wrap around to the initial state, which is why transaction IDs form a ring.&lt;/p&gt;
&lt;p&gt;Due to visibility rules, the 4.2 billion transactions must be split in half: one half represents the future, the other the past. The difference between max and min transactions in a PG instance cannot exceed 2.1 billion — hence the 2.1 billion transaction limit.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bbce62f757b4.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.interdb.jp/pg/pgsql05/01.html）&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Transaction ID freezing:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Due to visibility rules, if a visible row (e.g., xid=100) differs from the latest transaction by more than 2.1 billion, it becomes invisible:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/57a0de81e82c.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（Forgot the source; look it up）&lt;/p&gt;
&lt;p&gt;To solve this, the transaction ID freezing mechanism was introduced. Freezing sets the xmin of overly old tuples to FrozenXID=2, older than all normal transactions. That is, txid=2 is visible to all normal transactions (txid&amp;gt;=3). In version 9.4+, t_infomask&amp;rsquo;s xmin_frozen flag indicates frozen tuples rather than rewriting t_xmin to 2.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Lazy mode:&lt;/strong&gt; The VM file was originally designed to reduce vacuum overhead by letting vacuum skip pages with no dead tuples (all-visible). Later (pg9.4), the freeze process was enhanced so lazy mode freezing can also skip all-visible pages during vacuum.&lt;/p&gt;
&lt;p&gt;Lazy mode freeze trigger: triggered alongside vacuum operation (seems to have no independent trigger condition???)&lt;/p&gt;
&lt;p&gt;Lazy mode freeze which tuples: except pages marked all-visible in VM that get skipped, freezes tuples whose xmin-to-active-transaction-ID (actually oldestxmin) gap exceeds &lt;code&gt;vacuum_freeze_min_age&lt;/code&gt; (default 50M), marking them xmin_frozen. In the diagram below, tuple 9&amp;rsquo;s xmin=3000 won&amp;rsquo;t be frozen.&lt;/p&gt;
&lt;p&gt;Lazy mode is more of a vacuum side-effect: since we&amp;rsquo;re already concurrently vacuum scanning and cleaning dead tuples with pages already scanned, we might as well freeze eligible tuples.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/47912e7b0750.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Eager mode:&lt;/strong&gt; Lazy mode has a problem: it works alongside vacuum, skipping pages with no dead tuples (all-visible). If a page contains only live tuples (all-visible but not all-frozen) with very old xmin values, lazy mode alone can&amp;rsquo;t freeze them. So eager mode is needed: skip pages already marked all-frozen in VM and freeze the rest. In real scenarios, eager mode is typically the one running periodically and requiring attention: &lt;strong&gt;even if only one page in a table has tuples that are all inserts (even just one static page), eager mode is needed&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Eager mode freeze triggers:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Vacuum_freeze_table_age&lt;/code&gt; for vacuum operations: when the &lt;strong&gt;database-level&lt;/strong&gt; minimum xmin (actually &lt;code&gt;pg_database.datfrozenxid&lt;/code&gt;, also the minimum of all &lt;code&gt;pg_class.relfrozenxid&lt;/code&gt; in that database) and the active transaction ID (actually oldestxmin) gap exceeds &lt;code&gt;Vacuum_freeze_table_age&lt;/code&gt; (default 150M), &lt;strong&gt;vacuum&lt;/strong&gt; triggers eager mode freezing.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;autovacuum_freeze_max_age&lt;/code&gt; for autovacuum: whether lazy mode or eager mode &lt;code&gt;Vacuum_freeze_table_age&lt;/code&gt;, vacuum must first be triggered. Relying solely on vacuum&amp;rsquo;s own trigger conditions for freezing is unreliable; a freeze-specific deadline parameter is needed: &lt;code&gt;autovacuum_freeze_max_age&lt;/code&gt;. When tuple age exceeds &lt;code&gt;autovacuum_freeze_max_age&lt;/code&gt; (200M), autovacuum is force-triggered for freezing. Even if autovacuum is disabled, this deadline-triggered freeze still works.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Eager mode freeze which tuples: similar to lazy mode, except for all-frozen pages (lazy uses all-visible — different), freezes tuples whose xmin-to-active-transaction-ID gap exceeds &lt;code&gt;vacuum_freeze_min_age&lt;/code&gt; (default 50M). In the diagram, tuple 11 is not frozen.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d24f548bb484.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;vacuum freeze command:&lt;/strong&gt; &lt;code&gt;VACUUM FREEZE&lt;/code&gt; is equivalent to setting vacuum_freeze_min_age and vacuum_freeze_table_age to 0, performing eager mode freezing for all inactive xmin tuples.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;vacuum_failsafe_age:&lt;/strong&gt; Since large table vacuum operations are very slow, freeze may not finish before transaction ID wraparound occurs. Because freeze is done by the vacuum process, and vacuum has many other operations and parameter settings, to accelerate freeze, cost-based vacuuming, buffer strategy, and index vacuuming are all ignored. Parameter default is 1.6B; actually, during vacuum the effective value is no lower than autovacuum_freeze_max_age * 105%.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CLOG may also be updated:&lt;/strong&gt; Additionally, if freezing updates pg_database.datfrozenxid, unnecessary CLOG is also cleaned. CLOG records transaction status for determining &amp;ldquo;relatively new&amp;rdquo; transaction and tuple visibility. If a database&amp;rsquo;s frozenxid has been advanced recently, meaning those &amp;ldquo;old&amp;rdquo; tuples have been marked as frozen — always visible — then &amp;ldquo;old&amp;rdquo; transaction status info in CLOG can be discarded.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/1e864c9bc4a1.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Maintenance optimization:&lt;/strong&gt; (summarized from Can Zong&amp;rsquo;s summary)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Monitor pg_database.frozenxid in production. When approaching trigger values, proactively run VACUUM FREEZE during low-traffic windows rather than waiting for passive database triggers.&lt;/li&gt;
&lt;li&gt;Partition tables; overly large tables cause long prevent-wraparound operations&lt;/li&gt;
&lt;li&gt;Set different vacuum ages for large tables: ALTER TABLE test SET (autovacuum_freeze_max_age=xxxx);&lt;/li&gt;
&lt;li&gt;User-scheduled freeze: during low-traffic windows, VACUUM FREEZE large, aged tables&lt;/li&gt;
&lt;li&gt;Watch for freeze-blocking scenarios: long transactions, replication slots, hot_standby_feedback, pg_dump, cursors, orphan transactions&lt;/li&gt;
&lt;li&gt;Set sufficient worker processes to avoid vacuum scenarios queuing&lt;/li&gt;
&lt;li&gt;If load is a concern, consider enabling cost-based vacuuming (vacuum_cost_delay etc.)&lt;/li&gt;
&lt;li&gt;autovacuum_freeze_max_age should exceed vacuum_freeze_table_age to leave room for manual vacuum. Official recommendation: vacuum_freeze_table_age = 0.95 * autovacuum_freeze_max_age; if vacuum_freeze_table_age is below 0.95 * autovacuum_freeze_max_age, vacuum still takes 0.95 * autovacuum_freeze_max_age.&lt;/li&gt;
&lt;li&gt;vacuum_failsafe_age: PG14+ set reasonable vacuum_failsafe_age to accelerate large table freeze and prevent wraparound; should exceed autovacuum_freeze_max_age * 105%.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://www.interdb.jp/pg/" target="_blank" rel="noreferrer"&gt;https://www.interdb.jp/pg/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/p6aFhghpDEGu6lIBD8A5Yw" target="_blank" rel="noreferrer"&gt;深入理解PostgreSQL冻结炸弹&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/130782577" target="_blank" rel="noreferrer"&gt;pg事务：事务ID&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;24. Vacuum / Autovacuum Functions and Tuning
 &lt;div id="24-vacuum--autovacuum-functions-and-tuning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#24-vacuum--autovacuum-functions-and-tuning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Functions:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Clean up &amp;ldquo;dead tuples&amp;rdquo; left by UPDATE or DELETE operations&lt;/li&gt;
&lt;li&gt;Track available space in table blocks, update free space map&lt;/li&gt;
&lt;li&gt;Update visibility map needed for index-only scans&lt;/li&gt;
&lt;li&gt;&amp;ldquo;Freeze&amp;rdquo; rows in tables to prevent transaction ID wraparound&lt;/li&gt;
&lt;li&gt;Periodically ANALYZE to update statistics&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Tuning:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Set sufficient worker processes to avoid vacuum queuing&lt;/li&gt;
&lt;li&gt;Increase maintenance_work_mem (or autovacuum_work_mem)&lt;/li&gt;
&lt;li&gt;Watch for vacuum-blocking scenarios: long transactions, replication slots, hot_standby_feedback, pg_dump, cursors, orphan transactions&lt;/li&gt;
&lt;li&gt;For special tables (business-sensitive, large), set separate autovacuum trigger thresholds (threshold, fillfactor; insert threshold, fillfactor): dead tuple cleanup threshold, stats update threshold, wraparound prevention threshold&lt;/li&gt;
&lt;li&gt;For special tables, disable per-table autovacuum and run vacuum during off-peak hours for dead tuple cleanup, statistics, and wraparound&lt;/li&gt;
&lt;li&gt;If business load is a concern, enable cost-based vacuuming with sleep at thresholds&lt;/li&gt;
&lt;li&gt;Partition tables to avoid vacuum running endlessly or restarting immediately after finishing&lt;/li&gt;
&lt;li&gt;Avoid VACUUM FULL (8-level lock). Use logical replication + rename or pg_repack for table/index bloat handling, improving efficiency and reclaiming space&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 class="relative group"&gt;25. Function Volatility Categories and Why Functions Need EXECUTE
 &lt;div id="25-function-volatility-categories-and-why-functions-need-execute" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#25-function-volatility-categories-and-why-functions-need-execute" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;VOLATILE (unstable, default):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Can do anything, including modifying the database&lt;/li&gt;
&lt;li&gt;Within the same transaction, even with identical parameters, may return different results&lt;/li&gt;
&lt;li&gt;Obtains a snapshot for &lt;strong&gt;each query execution&lt;/strong&gt; within the function, so even identical interactive queries within the same function may produce different results due to changing visible data&lt;/li&gt;
&lt;li&gt;Since recalculation is needed each time, the optimizer can&amp;rsquo;t pre-estimate; performance may be poor&lt;/li&gt;
&lt;li&gt;Function indexes not supported&lt;/li&gt;
&lt;li&gt;Typical functions: timeofday(), random(), all modifying functions&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;STABLE:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cannot modify the database&lt;/li&gt;
&lt;li&gt;Within the same transaction, identical parameters return identical results. Snapshot obtained at function start; internal queries don&amp;rsquo;t re-obtain; identical interactive queries within the function produce consistent results&lt;/li&gt;
&lt;li&gt;Function indexes not supported&lt;/li&gt;
&lt;li&gt;Typical functions: current_timestamp family; regardless of how many times called within a transaction, only one value&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;IMMUTABLE (very stable):&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cannot modify the database&lt;/li&gt;
&lt;li&gt;Given identical parameters, always returns identical results. Snapshot acquisition principle same as STABLE&lt;/li&gt;
&lt;li&gt;Key difference from STABLE: IMMUTABLE not only caches the plan but reuses this plan in subsequent executions&lt;/li&gt;
&lt;li&gt;Function indexes supported&lt;/li&gt;
&lt;li&gt;Some database-parameter-dependent functions shouldn&amp;rsquo;t be marked IMMUTABLE, e.g., timezone-related functions should be STABLE&lt;/li&gt;
&lt;li&gt;Typical function: calculating 1+2&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Why functions need EXECUTE:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;PREPARE: parsed, analyzed, and rewritten&lt;/p&gt;
&lt;p&gt;EXECUTE: planned and executed&lt;/p&gt;
&lt;p&gt;Forcing SQL hard parsing: prevents SQL from using incorrect execution plans due to data skew.&lt;/p&gt;
&lt;p&gt;Unlike plain SQL, plpgsql defaults to Plan Caching, automatically executing SQL as PREPARE, attempting to generate and cache generic plans for soft parsing. However, with data skew, cached execution plans may be inefficient and unacceptable for core business. In such cases, consider using EXECUTE statements to force per-variable-value execution plans, improving accuracy.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/Hehuyi_In/article/details/128885660&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.postgresql.org/docs/16/xfunc-volatility.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/16/xfunc-volatility.html&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;26. Why Use CREATE INDEX CONCURRENTLY and Its Hazards
 &lt;div id="26-why-use-create-index-concurrently-and-its-hazards" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#26-why-use-create-index-concurrently-and-its-hazards" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Why CIC:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;CREATE INDEX requires a ShareLock, which conflicts with DML&amp;rsquo;s RowExclusiveLock. So online business shouldn&amp;rsquo;t directly use CREATE INDEX. CIC uses ShareUpdateExclusiveLock, which doesn&amp;rsquo;t conflict with DML locks, so CIC is recommended for index creation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;CIC process:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Insert index metadata into system catalogs (pg_class, pg_index), then open two transactions for two scans&lt;/li&gt;
&lt;li&gt;Open transaction 1, get snapshot1&lt;/li&gt;
&lt;li&gt;Before scanning table, wait for all transactions that modified the table (insert/delete/update) to finish&lt;/li&gt;
&lt;li&gt;Scan table and build index&lt;/li&gt;
&lt;li&gt;End transaction 1&lt;/li&gt;
&lt;li&gt;Open transaction 2, get snapshot2&lt;/li&gt;
&lt;li&gt;Before second scan, wait for all transactions that modified the table to finish&lt;/li&gt;
&lt;li&gt;DML on the table from transactions started after snapshot2 will update this index&lt;/li&gt;
&lt;li&gt;Second table scan, update index (version numbers from tuples allow identifying records changed between snapshot1 and snapshot2, merging them into the index)&lt;/li&gt;
&lt;li&gt;After index update, wait for transactions holding snapshots that started before transaction 2 to finish&lt;/li&gt;
&lt;li&gt;End index creation. Index becomes visible.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;CIC issues:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Opens two transactions sequentially, scanning the table one extra time vs CREATE INDEX&lt;/li&gt;
&lt;li&gt;Must wait for long transactions to finish before scanning can begin&lt;/li&gt;
&lt;li&gt;CIC-created indexes may become invalid
&lt;ul&gt;
&lt;li&gt;CIC interrupted abnormally leaves an invalid index&lt;/li&gt;
&lt;li&gt;During CIC unique index creation, inserted/updated data violating unique constraints also causes CIC failure leaving an invalid index&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Invalid indexes still get updated by DML&lt;/li&gt;
&lt;li&gt;Partition parent tables don&amp;rsquo;t support CIC index creation; create indexes with CIC on child partitions one by one, then create the index on the parent with ONLY&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/Sayutoyj7QmV5Nl8EFlwiQ" target="_blank" rel="noreferrer"&gt;学徒 深度剖析CIC&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;27. HOT Principle
 &lt;div id="27-hot-principle" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#27-hot-principle" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;HOT:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Without HOT, every tuple update would update indexes. Below, one additional updated tuple adds one index entry, and the old index entry points to the dead tuple. This causes index update, index space, and index vacuum pressure.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4a5a7f3ac437.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;With HOT, in-page updates only update the tuple, not the index:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e618933424af.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;HOT tuples correspond to HEAP_HOT_UPDATED and HEAP_ONLY_TUPLE bits in infomask:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tt(a int);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxtt &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tt(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- execute multiple times
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tt; &lt;span style="color:#75715e"&gt;-- after update, run a visibility check to write remaining clog commit info to tuple header
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tt&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+-----------+--------+-----------------------------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_HOT_UPDATED&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lp(line pointer)=1&amp;rsquo;s tuple points to row 2 via ctid(0,2); row 2 points to row 3&amp;hellip; ultimately to row 5. ctid forms a chain pointing to the final data row. Dead tuples all carry HEAP_HOT_UPDATED, indicating the tuple is an updated row on the HOT chain; the chain tail has HEAP_ONLY_TUPLE, marking the end of the HOT chain.&lt;/p&gt;
&lt;p&gt;With HOT, vacuum only cleans dead tuples within the page without updating indexes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c220fe22e28a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; tt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;VACUUM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tt&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+-----------+--------+----------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After vacuum, dead tuples are cleaned.&lt;/p&gt;
&lt;p&gt;On subsequent updates, a new HOT chain begins:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; tt &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid, raw_flags, combined_flags
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;tt&amp;#39;&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; t_infomask &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; t_infomask2 &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----+-----------+--------+-----------------------------------------------------------------------------------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID,HEAP_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LP_NORMAL &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;HEAP_XMIN_COMMITTED,HEAP_XMAX_COMMITTED,HEAP_UPDATED,HEAP_HOT_UPDATED,HEAP_ONLY_TUPLE&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{}&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Why doesn&amp;rsquo;t the new HOT chain start from lp1? Because lp1 is already occupied — the index still points to lp1.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; itemoffset, ctid, &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;, dead, htid, tids[&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;] &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; some_tids
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bt_page_items(&lt;span style="color:#e6db74"&gt;&amp;#39;idxtt&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; itemoffset &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dead &lt;span style="color:#f92672"&gt;|&lt;/span&gt; htid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; some_tids 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------+-------+-------------------------+------+-------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;htid (0,1) is page 0, lp 1. Vacuum only cleaned the data page; the index was not updated. Vacuum only cleaned dead tuples and the middle of the HOT chain; HOT chain head and tail ctids were untouched.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;INDEX ONLY SCAN:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Index-only scan is a common and efficient scan method across databases: it returns results by accessing only index pages without touching data pages. However, this is problematic in PG because visibility information is stored in data page headers, not index pages. Accessing only the index can&amp;rsquo;t support MVCC in principle.&lt;/p&gt;
&lt;p&gt;The VM file not only supports vacuum skipping all-visible pages but also supports INDEX ONLY SCAN for visibility determination on all-visible pages:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b2b9809f61d7.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Reference: interdb&lt;/p&gt;

&lt;h3 class="relative group"&gt;28. Does PostgreSQL Have Lock Escalation?
 &lt;div id="28-does-postgresql-have-lock-escalation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#28-does-postgresql-have-lock-escalation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Basically no.&lt;/p&gt;
&lt;p&gt;Only Predicate lock has escalation. Predicate lock is used when serializable isolation is needed, intended to lock predicates and prevent data anomalies to achieve serializability. In PG, this corresponds to SIReadLock.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Predicate lock&amp;rsquo;s finest granularity is locking rows within a range&lt;/li&gt;
&lt;li&gt;When row count exceeds a threshold, lock the corresponding page&lt;/li&gt;
&lt;li&gt;When page count exceeds a threshold, lock the corresponding table&lt;/li&gt;
&lt;li&gt;Predicate lock has only 3 lock levels: row, page, table&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5968020" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5968020&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;29. Replication Slot Functions and Hazards
 &lt;div id="29-replication-slot-functions-and-hazards" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#29-replication-slot-functions-and-hazards" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;For physical replication, replication slots aren&amp;rsquo;t strictly necessary; hot_standby_feedback and other parameters can manage WAL. With replication slots, those parameters become unnecessary — slots manage WAL logs.&lt;/p&gt;
&lt;p&gt;For logical replication, replication slots are mandatory; one logical replication link corresponds to one slot. For logical replication, slots manage not only WAL logs but also logical decoding, output plugin, decoding/sending positions (LSN), allowing retransmission of decoded logs after replication interruption.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a3a4118829be.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Replication slot hazards:&lt;/p&gt;
&lt;p&gt;Actually, replication slots have no inherent hazards. Their primary function is simplifying WAL log management. Without slots, you still need WAL management strategies. The PG community recommends using slots. Just note: always clean up unused slots to prevent them holding old positions that block WAL cleanup, filling the disk. Additionally, DBAs shouldn&amp;rsquo;t casually drop slots — once dropped, position info is lost, and downstream links may need data reinitialization and resynchronization. Better to confirm whether the replication link can restart syncing.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;pg内功修炼：逻辑复制&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;30. Why Deadlocks Occur and Deadlock Detection Mechanism
 &lt;div id="30-why-deadlocks-occur-and-deadlock-detection-mechanism" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#30-why-deadlocks-occur-and-deadlock-detection-mechanism" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a158c01929e8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Simplest case: transaction T1 holds resource 1, transaction T2 holds resource 2. If T1 tries to acquire resource 2 and T2 tries to acquire resource 1, a deadlock forms. Without management, deadlocks can wait indefinitely, so all DBMS have deadlock detection. Deadlocks usually indicate business logic issues. If no explicit cancellation of one transaction in the &amp;ldquo;ring&amp;rdquo; breaks it, PG auto-detects deadlocks and force-terminates one transaction via the &lt;code&gt;deadlock_timeout&lt;/code&gt; parameter (default 1s); other transactions in the &amp;ldquo;ring&amp;rdquo; can continue.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/blog/pgsql/5968020" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/blog/pgsql/5968020&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;31. SQL Performance Troubleshooting Approaches
 &lt;div id="31-sql-performance-troubleshooting-approaches" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#31-sql-performance-troubleshooting-approaches" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2e1b3f17a0b2.png" alt="Insert image description here" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;32. Why Use Partitioned Tables, Advantages and Disadvantages
 &lt;div id="32-why-use-partitioned-tables-advantages-and-disadvantages" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#32-why-use-partitioned-tables-advantages-and-disadvantages" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Partitioned tables split table data into smaller physical fragments to improve performance, availability, and manageability, transparent to applications. Partitioned tables are a common optimization for large tables in relational databases. DBMS generally provide partition management, and applications can directly access partitioned tables without architecture changes — though good performance requires proper partition access patterns.&lt;/p&gt;
&lt;p&gt;PG natively supports declarative partitioning and inheritance partitioning. Common plugin-based implementations include pg_pathman. PG10 introduced declarative partitioning with many enhancements in subsequent versions (see PostgreSQL Partitioned Tables — History). PG12+ with declarative partitioning is recommended.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Advantages of partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SQL performance improvement. In some scenarios, e.g., splitting large data into multiple partitions where SQL only queries one partition, partition pruning can dramatically improve performance&lt;/li&gt;
&lt;li&gt;Partitions work with indexes. Accessing one partition&amp;rsquo;s index is more efficient than accessing an unpartitioned large index&lt;/li&gt;
&lt;li&gt;Dropping a partition is more efficient than deleting many rows. Common in time-range partitioning: dropping an unused historical partition is very fast, while DELETE without partitions is slow and requires extra maintenance&lt;/li&gt;
&lt;li&gt;Faster vacuum. Vacuuming a large table for old version cleanup or statistics collection can be very slow; SQL problems may arise before vacuum finishes. With partitions, vacuum is much faster&lt;/li&gt;
&lt;li&gt;IO distribution. Different partitions can be placed on different paths/disks. Rarely used data can go on cheaper disks&lt;/li&gt;
&lt;li&gt;More maintenance techniques. Directly maintaining a huge table is very difficult (e.g., vacuuming an extremely large table has many issues), while partitioned table partitions can be vacuumed individually. Also, attach/detach, local indexes/constraints etc. can be flexibly used&lt;/li&gt;
&lt;li&gt;May enable partition-wise join or partition-wise aggregation features&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Disadvantages of partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;In PG, partitions are also tables; too many tables cause slow parsing and large relcache metadata caching&lt;/li&gt;
&lt;li&gt;Too many tables may cause errors. Reference: &lt;a href="https://editor.csdn.net/md/?articleId=131497779" target="_blank" rel="noreferrer"&gt;较少的分区也报错too many range table entries&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Even if partition count doesn&amp;rsquo;t error, without partition pruning during plan generation (may happen at execution), EXPLAIN output becomes very large, and logs become bloated with long plans&lt;/li&gt;
&lt;li&gt;Strange issues: &lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247489813&amp;amp;idx=1&amp;amp;sn=22360e2bfd40fc2d0caed0a9d825b1d4&amp;amp;chksm=fa663124cd11b832953e789127927ffa0d63d6c948ca8934d5317b8eaae6e71374041ec038f7&amp;amp;mpshare=1&amp;amp;srcid=0728JrXnHdxnfgRVzqosBNcv&amp;amp;sharer_sharetime=1690509489198&amp;amp;sharer_shareid=0412ea33e50b471b98d8859a5c431367&amp;amp;from=singlemessage&amp;amp;scene=1&amp;amp;subscene=10000&amp;amp;sessionid=1690509419&amp;amp;clicktime=1690509545&amp;amp;enterid=1690509545&amp;amp;ascene=1&amp;amp;fasttmpl_type=0&amp;amp;fasttmpl_fullversion=6785798-en_US-zip&amp;amp;fasttmpl_flag=0&amp;amp;realreporttime=1690509545257&amp;amp;devicetype=android-29&amp;amp;version=28002658&amp;amp;nettype=WIFI&amp;amp;abtest_cookie=AAACAA%3D%3D&amp;amp;lang=en&amp;amp;countrycode=CN&amp;amp;exportkey=n_ChQIAhIQCCtq2jm3UsFznlVjxFEOWBLaAQIE97dBBAEAAAAAABKTCFyWAsoAAAAOpnltbLcz9gKNyK89dVj0LyxnG1pA6NiO6PHIsQ0Hy2N7QRbizb9SHdquaFOpOqANqG8jLDcioswZyRnYknjG4bSqNIIKm%2BpRIlK%2FVJxuwolH2%2FQJKSLg4YjccDktYYscUDvYSfHFx1ScEXZkOkbVqrvbBCPy6Gh2GnzulFuuIU68afNtsoBdzZTqHYbL0BfsAUhsz1iGAfSep642UT2CBpWSHWJQvndnwhZxjJ6%2FWO%2FI%2FqwncggiVeDNiv4vwXhluDNn&amp;amp;pass_ticket=mrpzS3wggBDzL9Ua2FmX5v1rYh6zKOnQ4og6oKcKv0ZXRfNBSUpSkGdTAcfXqgDo&amp;amp;wx_header=3" target="_blank" rel="noreferrer"&gt;不同用户查看到不同的执行计划&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Major limitations of PG native partitioned tables:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No native automatic partition creation&lt;/li&gt;
&lt;li&gt;Only local indexes supported, no global indexes&lt;/li&gt;
&lt;li&gt;Primary key must include the partition key. PostgreSQL currently can only enforce uniqueness within individual partitions, hence this limitation. Oracle and MySQL don&amp;rsquo;t have this restriction&lt;/li&gt;
&lt;li&gt;Unique index must include the partition key (same reason as primary key)&lt;/li&gt;
&lt;li&gt;Cannot create global constraints (child tables inherit but can&amp;rsquo;t create table-level global constraints)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Partitioned table maintenance:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;New partitions without data: directly use PARTITION OF (8-level lock; just watch for long transactions)&lt;/li&gt;
&lt;li&gt;New partitions with data: use ATTACH (4-level lock, doesn&amp;rsquo;t block reads/writes) to add; if needed, pre-add partition constraints to reduce constraint check time. DETACH CONCURRENTLY (4-level lock) to remove partitions&lt;/li&gt;
&lt;li&gt;Note: ATTACH doesn&amp;rsquo;t auto-create indexes, constraints, defaults, or row-level triggers like PARTITION OF does; create them beforehand&lt;/li&gt;
&lt;li&gt;Partition parent table indexes don&amp;rsquo;t support CIC. Correct approach for partition index creation: 1) create ONLY on parent 2) create CONCURRENTLY on partitions 3) ATTACH all partition indexes to the parent; the index auto-marks as valid&lt;/li&gt;
&lt;li&gt;Increasing column length won&amp;rsquo;t rebuild indexes, EXCEPT for partitioned tables where it WILL rebuild indexes&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655" target="_blank" rel="noreferrer"&gt;PostgreSQL分区表&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;33. Soft Parsing vs Hard Parsing Concepts
 &lt;div id="33-soft-parsing-vs-hard-parsing-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#33-soft-parsing-vs-hard-parsing-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Hard parsing:&lt;/strong&gt; For a SQL statement, the optimizer must first perform lexical and syntax analysis, converting it into a query tree PG can understand, then rewrite and optimize it, generating an execution plan tree before the executor can execute. This full parsing process is called hard parsing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Soft parsing:&lt;/strong&gt; Obviously, performing such complex steps for every statement each time would be very inefficient. So PG caches SQL execution plans in process memory. When certain conditions are met, cached plans can be used directly, improving efficiency. This is soft parsing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PG bind-variable SQL parsing: the five-time mechanism:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The five-time mechanism prevents data skew from causing inefficient execution plans.&lt;/p&gt;
&lt;p&gt;First 5 executions: each generates an execution plan based on actual bound variables (called custom plans) — this is hard parsing.
6th execution: generates a generic execution plan (generic plan) and compares it with the previous 5 plans.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If not worse than the first 5: the 6th plan is fixed; subsequently, regardless of parameter changes, the SQL execution plan won&amp;rsquo;t change — this is soft parsing&lt;/li&gt;
&lt;li&gt;If worse than any of the first 5 plans: every subsequent execution regenerates the plan — all hard parsing&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Forcing soft/hard parsing:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;PG 12 introduced the &lt;code&gt;force_custom_plan&lt;/code&gt; parameter with options:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;auto: default, uses the five-time mechanism&lt;/li&gt;
&lt;li&gt;force_custom_plan: always hard parse; suitable for SQL with data skew where performance and stability are critical&lt;/li&gt;
&lt;li&gt;force_generic_plan: always use generic plan; suitable for SQL without data skew or where performance/stability requirements are lower&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PG 14 added generic_plans and custom_plans columns to pg_prepared_statements, showing counts for both plan types. Since PG execution plans are only cached in-process, pg_prepared_statements only shows the current session&amp;rsquo;s SQL, not other sessions or global info.&lt;/p&gt;
&lt;p&gt;Five-time mechanism source code:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * choose_custom_plan: choose whether to use custom or generic plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * This defines the policy followed by GetCachedPlan.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;choose_custom_plan&lt;/span&gt;(CachedPlanSource &lt;span style="color:#f92672"&gt;*&lt;/span&gt;plansource, ParamListInfo boundParams)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;double&lt;/span&gt;		avg_custom_cost;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Let settings force the decision */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plan_cache_mode &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PLAN_CACHE_MODE_FORCE_GENERIC_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plan_cache_mode &lt;span style="color:#f92672"&gt;==&lt;/span&gt; PLAN_CACHE_MODE_FORCE_CUSTOM_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* See if caller wants to force the decision */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cursor_options &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; CURSOR_OPT_GENERIC_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;cursor_options &lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt; CURSOR_OPT_CUSTOM_PLAN)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Generate custom plans until we have done at least 5 (arbitrary) */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	avg_custom_cost &lt;span style="color:#f92672"&gt;=&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;total_custom_cost &lt;span style="color:#f92672"&gt;/&lt;/span&gt; plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;num_custom_plans;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Prefer generic plan if it&amp;#39;s less expensive than the average custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plan. (Because we include a charge for cost of planning in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * custom-plan costs, this means the generic plan only has to be less
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * expensive than the execution cost plus replan cost of the custom
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * plans.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Note that if generic_cost is -1 (indicating we&amp;#39;ve not yet determined
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the generic plan cost), we&amp;#39;ll always prefer generic at this point.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (plansource&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;generic_cost &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; avg_custom_cost)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/Hehuyi_In/article/details/128885660" target="_blank" rel="noreferrer"&gt;Hehuyi_In 软硬解析的概念&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;34. What Are VM / FSM / INIT Files
 &lt;div id="34-what-are-vm--fsm--init-files" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#34-what-are-vm--fsm--init-files" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d0c6c3c47a5b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Numeric suffix:&lt;/strong&gt; Files fork when exceeding 1GB (default); changeable at build time via &lt;code&gt;./configure --with-segsize&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;VM:&lt;/strong&gt; Visibility map, containing all-visible and all-frozen info. Helps: 1) accelerate vacuum scanning (skip all-visible pages) 2) accelerate eager freeze (skip all-frozen pages) 3) support INDEX ONLY SCAN (all-visible pages don&amp;rsquo;t need page access for tuple visibility checks)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FSM:&lt;/strong&gt; Free space map, helping PG locate free space on pages. For index pages, since indexes are ordered, recording per-page free space is less meaningful; index FSM files only contain fully empty index pages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;INIT:&lt;/strong&gt; A fork file only for unlogged tables, size 0, marking the data file as unlogged.&lt;/p&gt;
&lt;p&gt;《postgresql-internals-14》&lt;/p&gt;

&lt;h3 class="relative group"&gt;35. Memory Reclaim Mechanisms: kswapd / Direct Memory Reclaim / pdflush
 &lt;div id="35-memory-reclaim-mechanisms-kswapd--direct-memory-reclaim--pdflush" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#35-memory-reclaim-mechanisms-kswapd--direct-memory-reclaim--pdflush" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Memory reclaim mechanisms:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Background memory reclaim (kswapd): When physical memory is tight, the kswapd kernel thread is woken to reclaim memory asynchronously, not blocking process execution.
Direct memory reclaim: If background async reclaim can&amp;rsquo;t keep up with process memory allocation requests, direct reclaim begins — synchronous, blocking process execution.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/664b2fe2f965.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://vivani.net/2022/06/14/linux-kernel-tuning-page-allocation-failure/)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pages_low:&lt;/strong&gt; When available free pages drop below pages_low, buddy allocator wakes &lt;strong&gt;kswapd&lt;/strong&gt;; kernel begins swapping pages to disk.
&lt;strong&gt;pages_min:&lt;/strong&gt; When available pages reach pages_min, page reclaim pressure is high because the memory zone urgently needs free pages. Allocator performs kswapd work synchronously — sometimes called direct reclaim.
&lt;strong&gt;pages_high:&lt;/strong&gt; Once kswapd is woken and releasing pages, only when available pages reach pages_high does the kernel consider the zone &amp;ldquo;balanced&amp;rdquo;. At pages_high, kswapd re-enters sleep. Free pages above pages_high mean the zone state is ideal.
Memory reclaim operates per-zone; /proc/zoneinfo shows min, low, high values.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;vm.min_free_kbytes&lt;/code&gt; (the min_pages line) is a critically important OS parameter. Very low values prevent effective system memory reclamation, potentially causing crashes and service interruptions. Excessively high values increase reclaim activity, causing allocation latency and potentially immediate out-of-memory states.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;pdflush and kcompactd:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;pdflush: pagecache dirty pages must be written to disk. Whether via sync (fsync etc.), OS-scheduled flushing, or database commits, ultimately the Linux kernel thread pdflush handles the flushing work.&lt;/p&gt;
&lt;p&gt;kcompactd: page compaction specifically targets memory fragmentation cleanup (flushing also works since memory returns to the buddy system). Unlike pdflush flushing, memory compaction doesn&amp;rsquo;t require disk writes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Observing memory reclaim:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;sar is one of the most comprehensive Linux system performance analysis tools, reporting on multiple dimensions: file read/write, syscall usage, disk I/O, CPU efficiency, memory usage, process activity, and IPC.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9f0b4a87e536.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;sar -B&lt;/code&gt; observes kswapd and direct memory reclaim:&lt;/p&gt;
&lt;p&gt;Example: sar viewing memory page status
&lt;code&gt;sar -B 1 3&lt;/code&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pgpgin/s: KB read from disk/SWAP into memory per second&lt;/li&gt;
&lt;li&gt;pgpgout/s: KB written from memory to disk/SWAP per second&lt;/li&gt;
&lt;li&gt;fault/s: page faults per second (major + minor)&lt;/li&gt;
&lt;li&gt;majflt/s: major page faults per second&lt;/li&gt;
&lt;li&gt;pgfree/s: pages placed in free queue per second&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pgscank/s: pages scanned by kswapd per second&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;pgscand/s: pages directly scanned per second&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;pgsteal/s: pages cleared from cache per second to meet memory needs&lt;/li&gt;
&lt;li&gt;%vmeff: percentage of stolen pages (pgsteal) vs total scanned (pgscank + pgscand)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Example: sar viewing historical memory info
&lt;code&gt;sar -B -s &amp;quot;08:00:00&amp;quot; -e &amp;quot;10:00:00&amp;quot;&lt;/code&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;# Without -e means from start time to now&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ sar -B -s &lt;span style="color:#e6db74"&gt;&amp;#34;08:00:00&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:45:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:46:01 PM 414429.37 395024.08 179478.63 0.07 352922.62 12003.78 4266.52 16269.42 99.99
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:47:01 PM 879907.08 337948.43 157970.97 0.02 402290.21 0.00 0.00 0.00 0.00
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;09:48:01 PM 772977.43 507343.30 150255.50 0.05 466742.08 0.00 5821.28 5821.27 100.00&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Strong recommendation: &lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312" target="_blank" rel="noreferrer"&gt;linux内存浅析&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;36. Process Scheduling, D Process Hazards and Causes
 &lt;div id="36-process-scheduling-d-process-hazards-and-causes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#36-process-scheduling-d-process-hazards-and-causes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Not fully understanding what &amp;ldquo;process scheduling&amp;rdquo; specifically refers to here; I&amp;rsquo;ll answer in terms of IPC (Inter-Process Communication).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IPC:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Since user space in virtual address space can&amp;rsquo;t be accessed by other user processes, achieving multi-process user access to the same memory data via kernel space inevitably involves context switching (as shown on the right below). Multi-process applications clearly need inter-process access, so a method enabling user processes to directly access the same physical memory emerged: shared memory (as shown on the left below).&lt;/p&gt;
&lt;p&gt;Shared memory is one IPC (Inter-Process Communication) mechanism; others include message queues and semaphores. Shared memory is one of the fastest IPC mechanisms because it doesn&amp;rsquo;t require inter-process data copying — processes access shared memory through their own address spaces.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d6a9535557f7.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.geeksforgeeks.org/inter-process-communication-ipc/）&lt;/p&gt;
&lt;p&gt;Shared memory has many implementations. In PG, shared_buffer defaults to mmap for shared memory (corresponds to &lt;code&gt;shared_memory_type&lt;/code&gt;); parallel queries default to POSIX (corresponds to &lt;code&gt;dynamic_shared_memory_type&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b2a8526ef63d.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;(&lt;a href="https://momjian.us/main/writings/pgsql/inside_shmem.pdf" target="_blank" rel="noreferrer"&gt;https://momjian.us/main/writings/pgsql/inside_shmem.pdf&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;D Process:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;D process meaning: Uninterruptible sleep state. Indicates the process is waiting for an external event to complete, such as disk I/O or network requests. Normally, D processes cannot be directly terminated.&lt;/p&gt;
&lt;p&gt;Causes of D processes: The process is waiting for an external event, typically direct memory reclaim — synchronous and blocking application disk access. At that moment, disk-access-related processes are in D state. Note: D processes are triggered at the OS or hardware level, largely unrelated to the application itself (a little). For example, a PG large query session itself won&amp;rsquo;t produce D processes and can be killed.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/135492312" target="_blank" rel="noreferrer"&gt;linux内存浅析&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/qq_40687433/article/details/135541103" target="_blank" rel="noreferrer"&gt;PostgreSQL内存浅析&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;37. Packet Capture and Analysis of PostgreSQL Protocol
 &lt;div id="37-packet-capture-and-analysis-of-postgresql-protocol" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#37-packet-capture-and-analysis-of-postgresql-protocol" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;PG supported protocols:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Connection protocols:&lt;/li&gt;
&lt;/ol&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TCP/IP:&lt;/strong&gt; PostgreSQL&amp;rsquo;s most common communication method, allowing client-server network connections and data exchange.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unix domain socket:&lt;/strong&gt; For same-host client-server connections, faster than TCP/IP.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SSL/TLS:&lt;/strong&gt; PostgreSQL supports SSL/TLS encryption on TCP/IP connections for data transmission security. TLS is SSL&amp;rsquo;s successor; PG (seemingly) no longer supports SSL protocol itself, though related parameters remain for TLS use.&lt;/li&gt;
&lt;/ul&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Password authentication protocols:&lt;/li&gt;
&lt;/ol&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;MD5:&lt;/strong&gt; As the earlier default password authentication protocol, MD5 (Message Digest Algorithm 5) stores and verifies user passwords server-side.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SCRAM-SHA-256:&lt;/strong&gt; A more secure authentication protocol using SHA-256 hashing and challenge-response for user authentication. PG10+ gradually replaces MD5.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Simple packet capture analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;tcpdump capture command:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tcpdump tcp port &lt;span style="color:#ae81ff"&gt;5432&lt;/span&gt; -i lo -s0 -nSX -vvv&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Capture a count(*) (already connected to database via psql -h):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&amp;gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; count&lt;span style="color:#f92672"&gt;(&lt;/span&gt;*&lt;span style="color:#f92672"&gt;)&lt;/span&gt; from t1; -- just capture this
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; count 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Captured content:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;828820&lt;/span&gt; IP (tos &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0, ttl &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;, id &lt;span style="color:#ae81ff"&gt;29027&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, flags [DF], proto TCP (&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;82&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37240&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.postgres: Flags [P.], cksum &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6d13 (incorrect &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x57c6), seq &lt;span style="color:#ae81ff"&gt;1091052893&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1091052923&lt;/span&gt;, ack &lt;span style="color:#ae81ff"&gt;3014367256&lt;/span&gt;, win &lt;span style="color:#ae81ff"&gt;350&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt; [nop,nop,TS val &lt;span style="color:#ae81ff"&gt;92480460&lt;/span&gt; ecr &lt;span style="color:#ae81ff"&gt;92427582&lt;/span&gt;], &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000: &lt;span style="color:#ae81ff"&gt;4500&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0052&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7163&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4006&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;c74 ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 E..Rqc&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;t...U
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0010: ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 &lt;span style="color:#ae81ff"&gt;9178&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1538&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;255&lt;/span&gt;d b3ab &lt;span style="color:#ae81ff"&gt;9818&lt;/span&gt; ...U.x.&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;A.&lt;span style="color:#f92672"&gt;%&lt;/span&gt;]....
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0020: &lt;span style="color:#ae81ff"&gt;8018&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;015&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;d13 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0101&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;080&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cc ...&lt;span style="color:#f92672"&gt;^&lt;/span&gt;m.........&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0030: &lt;span style="color:#ae81ff"&gt;0582&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;553&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;5100&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;d73 &lt;span style="color:#ae81ff"&gt;656&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6563&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7420&lt;/span&gt; ..U&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;Q....&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0040: &lt;span style="color:#ae81ff"&gt;636&lt;/span&gt;f &lt;span style="color:#ae81ff"&gt;756&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;7428&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;a29 &lt;span style="color:#ae81ff"&gt;2066&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;726&lt;/span&gt;f &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;d20 &lt;span style="color:#ae81ff"&gt;7431&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;).&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;.t1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0050: &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;b00 ;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;830090&lt;/span&gt; IP (tos &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0, ttl &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;, id &lt;span style="color:#ae81ff"&gt;49370&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, flags [DF], proto TCP (&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;115&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.postgres &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37240&lt;/span&gt;: Flags [P.], cksum &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6d34 (incorrect &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6e5c), seq &lt;span style="color:#ae81ff"&gt;3014367256&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3014367319&lt;/span&gt;, ack &lt;span style="color:#ae81ff"&gt;1091052923&lt;/span&gt;, win &lt;span style="color:#ae81ff"&gt;342&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt; [nop,nop,TS val &lt;span style="color:#ae81ff"&gt;92480461&lt;/span&gt; ecr &lt;span style="color:#ae81ff"&gt;92480460&lt;/span&gt;], &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;63&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000: &lt;span style="color:#ae81ff"&gt;4500&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0073&lt;/span&gt; c0da &lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4006&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;cdc ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 E..s..&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#f92672"&gt;@&lt;/span&gt;......U
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0010: ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 &lt;span style="color:#ae81ff"&gt;1538&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;9178&lt;/span&gt; b3ab &lt;span style="color:#ae81ff"&gt;9818&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;257&lt;/span&gt;b ...U.&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.x....A.&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0020: &lt;span style="color:#ae81ff"&gt;8018&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0156&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;d34 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0101&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;080&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cd ...Vm4........&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0030: &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cc &lt;span style="color:#ae81ff"&gt;5400&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;e00 &lt;span style="color:#ae81ff"&gt;0163&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;f75 &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;e74 ..&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.T......&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0040: &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1400&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;ff ffff ................
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0050: ff00 &lt;span style="color:#ae81ff"&gt;0044&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;000&lt;/span&gt;b &lt;span style="color:#ae81ff"&gt;0001&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0001&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3443&lt;/span&gt; ...D..........&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0060: &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;000&lt;/span&gt;d &lt;span style="color:#ae81ff"&gt;5345&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;c45 &lt;span style="color:#ae81ff"&gt;4354&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2031&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;005&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; ....&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.Z..
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0070: &lt;span style="color:#ae81ff"&gt;0005&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; ..I
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;51&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;830098&lt;/span&gt; IP (tos &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0, ttl &lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;, id &lt;span style="color:#ae81ff"&gt;29028&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, flags [DF], proto TCP (&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;), &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;52&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37240&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt;.postgres: Flags [.], cksum &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x6cf5 (incorrect &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x5cb9), seq &lt;span style="color:#ae81ff"&gt;1091052923&lt;/span&gt;, ack &lt;span style="color:#ae81ff"&gt;3014367319&lt;/span&gt;, win &lt;span style="color:#ae81ff"&gt;350&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt; [nop,nop,TS val &lt;span style="color:#ae81ff"&gt;92480461&lt;/span&gt; ecr &lt;span style="color:#ae81ff"&gt;92480461&lt;/span&gt;], &lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0000: &lt;span style="color:#ae81ff"&gt;4500&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0034&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;7164&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4006&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;c91 ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 E..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;qd&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#f92672"&gt;@&lt;/span&gt;.&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;....U
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0010: ac12 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;a55 &lt;span style="color:#ae81ff"&gt;9178&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1538&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4108&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;257&lt;/span&gt;b b3ab &lt;span style="color:#ae81ff"&gt;9857&lt;/span&gt; ...U.x.&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;A.&lt;span style="color:#f92672"&gt;%&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;...W
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0020: &lt;span style="color:#ae81ff"&gt;8010&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;015&lt;/span&gt;e &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;cf5 &lt;span style="color:#ae81ff"&gt;0000&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0101&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;080&lt;/span&gt;a &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cd ...&lt;span style="color:#f92672"&gt;^&lt;/span&gt;l.........&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x0030: &lt;span style="color:#ae81ff"&gt;0583&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;23&lt;/span&gt;cd ..&lt;span style="color:#f92672"&gt;#&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Reading packets visually&amp;hellip; simple analysis shows this count statement only generated 3 packets, and you can even see the &lt;code&gt;select.count(*).from.t1&lt;/code&gt; statement.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Wireshark packet analysis:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Window 1:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tcpdump tcp port &lt;span style="color:#ae81ff"&gt;5432&lt;/span&gt; -i lo -s0 -nSX -vvv -w tcpdump.cap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Window 2:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#f92672"&gt;@&lt;/span&gt;iZ2vcdugd3f2h0t7x20pqmZ &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; psql &lt;span style="color:#f92672"&gt;-&lt;/span&gt;h &lt;span style="color:#ae81ff"&gt;172&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;85&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;p &lt;span style="color:#ae81ff"&gt;5432&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;U lzl &lt;span style="color:#75715e"&gt;-- step 1, connect
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Password &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; lzl: &lt;span style="color:#75715e"&gt;-- step 2, enter password
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t1; &lt;span style="color:#75715e"&gt;-- step 3, query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;q &lt;span style="color:#75715e"&gt;-- step 4, exit&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note 4 steps, corresponding to at least 4 packet sections:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Step 1 - connection request&lt;/li&gt;
&lt;li&gt;Step 2 - password entry&lt;/li&gt;
&lt;li&gt;Step 3 - SQL query&lt;/li&gt;
&lt;li&gt;Step 4 - disconnect&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Now analyze tcpdump.cap with &lt;a href="https://www.wireshark.org/download.html" target="_blank" rel="noreferrer"&gt;Wireshark&lt;/a&gt;.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Step 1 - Connection Request [1-10] — TCP three-way handshake [1-3]:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d79f57937f52.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;37282-&amp;gt;5432 sends SYN, seq=0&lt;/li&gt;
&lt;li&gt;5432-&amp;gt;37282 sends SYN+ACK, seq=0 ack=1&lt;/li&gt;
&lt;li&gt;37282-&amp;gt;5432 sends ACK, seq=1 ack=1&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2c2b65905d60.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（https://www.researchgate.net/publication/340247809_Computer_Network_Chapter_8_Transport_Layer_UDP_and_TCP）&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Step 1 - Connection Request [1-10] — PGSQL protocol startup and authentication request [4-7]:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f1fe82c3a6ce.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;After the three-way handshake, PSQL client immediately sends a PGSQL protocol startup message to PG server [4], info: &amp;gt;?, the protocol startup message.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/376c522b4dd8.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;The above &amp;gt;? packet is 37282-&amp;gt;5432. You don&amp;rsquo;t need to check source/destination in Transmission Control Protocol. PGSQL protocol shows even less info than TCP, but it has direction: &amp;gt; means 37282-&amp;gt;5432, &amp;lt; means 37282&amp;lt;-5432.&lt;/p&gt;
&lt;p&gt;Next PGSQL protocol message is authentication request [6], info: &amp;lt;R, 37282&amp;lt;-5432.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e0135443ee2d.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Step 1 - Connection Request [1-10] — Three-way FIN [8-10]. After server sends PGSQL authentication request to client, client requests TCP disconnect, 3 TCP FINs (not 4; explained below). Note: at this point psql command line is waiting for password input&amp;hellip;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7a7094b99256.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Step 2 - Password Entry [11-22] — Three-way handshake [11-13]. Because the first TCP connection ended, establishing a connection again starts from TCP&amp;hellip; so another three-way handshake:&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/dd0ed1d5110b.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Step 2 - Password Entry [11-22] — Password authentication [14-22]. Authentication phase is slightly more complex. [14-16] essentially does the same as [4-7] in step 1: client requests PGSQL protocol startup, server returns authentication request. Then [18-20] performs password authentication using &lt;strong&gt;SCRAM-SHA-256&lt;/strong&gt; mechanism; password authentication actually transmits 4 packets, including [21]&amp;rsquo;s two R authentication messages. Then [21] connection established: first two R&amp;rsquo;s are authentication complete; many S&amp;rsquo;s represent Parameter status: application name, charset, timezone, etc.; K represents Backend key, returning forked backend PID; Z represents ready for query.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/969f8aba1e26.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Step 3 - SQL Query [23-25]&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d68d274be7ab.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;[23] Q clearly represents Query, client sends packet containing SQL; [24] returns results: T represents Row Description (here only column name &amp;ldquo;count&amp;rdquo;); D represents data row, the count result is 4, data is plaintext unencrypted:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/49c15b851c3e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;C represents Command complete; Z represents ready.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Step 4 - Disconnect [26-29]. [26] client actively sends session end message, PGSQL protocol (corresponds to \q); [27-29] again 3 TCP FINs.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9fb2ec1fbc1a.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;Why three FINs instead of four?&lt;/p&gt;
&lt;p&gt;&amp;ldquo;No more data to send&amp;rdquo; AND &amp;ldquo;TCP delayed ACK mechanism enabled&amp;rdquo; means the second and third FINs merge, resulting in three FINs:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d0fa97105c11.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;（&lt;a href="https://www.xiaolincoding.com/network/3_tcp/tcp_three_fin.html#tcp-%E5%9B%9B%E6%AC%A1%E6%8C%A5%E6%89%8B" target="_blank" rel="noreferrer"&gt;TCP 四次挥手，可以变成三次吗？&lt;/a&gt;）&lt;/p&gt;
&lt;p&gt;Since TCP delayed ACK is enabled by default, three-FIN scenarios appear more often than four-FIN in captures.&lt;/p&gt;
&lt;p&gt;OK, simple PG packet capture and analysis complete. Summary network transmission diagram for this session:



&lt;img src="https://lastdba.com/img/csdn/8c246946c86e.png" alt="Insert image description here" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Packet capture analysis notes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;First understand the link; typically many nodes exist between application clients and database servers: network switches, request forwarding services, etc.&lt;/li&gt;
&lt;li&gt;Capture on both ends simultaneously when possible&lt;/li&gt;
&lt;li&gt;Pay attention to capture timing and set appropriate filters&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Possible packet loss points:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/dF4juaW-ttI0Zn1j0z6tag" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/dF4juaW-ttI0Zn1j0z6tag&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Packet loss&lt;/strong&gt; involves NICs, drivers, and kernel protocol stack — each layer can lose packets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Between two VM connections, transmission failures may occur: network congestion, line errors, etc.&lt;/li&gt;
&lt;li&gt;After NIC receives packets, the ring buffer may overflow and drop packets&lt;/li&gt;
&lt;li&gt;At IP layer: routing failures, packet size exceeding MTU, etc.&lt;/li&gt;
&lt;li&gt;At transport layer: port not listening, resource usage exceeding kernel limits, etc.&lt;/li&gt;
&lt;li&gt;At socket layer: socket buffer overflow and packet loss&lt;/li&gt;
&lt;li&gt;At application layer: application exceptions causing packet loss&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;References:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.twblogs.net/a/5cbca833bd9eee0eff4612ff/?lang=zh-cn" target="_blank" rel="noreferrer"&gt;Tcpdump一次抓包记录（Postgresql通信）&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/dF4juaW-ttI0Zn1j0z6tag" target="_blank" rel="noreferrer"&gt;学徒 DBA必备技能之网络丢包分析总结&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://pigsty.cc/zh/blog/2018/01/05/pgsql%E5%8D%8F%E8%AE%AE%E5%88%86%E6%9E%90%E7%BD%91%E7%BB%9C%E6%8A%93%E5%8C%85/" target="_blank" rel="noreferrer"&gt;PgSQL协议分析:网络抓包&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://www.xiaolincoding.com/network/3_tcp/tcp_three_fin.html#tcp-%E5%9B%9B%E6%AC%A1%E6%8C%A5%E6%89%8B" target="_blank" rel="noreferrer"&gt;TCP 四次挥手，可以变成三次吗？&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;38. Storage: SAN / NAS / DAS
 &lt;div id="38-storage-san--nas--das" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#38-storage-san--nas--das" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c25bdda59915.png" alt="Insert image description here" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;39. Lifecycle of an IO Request
 &lt;div id="39-lifecycle-of-an-io-request" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#39-lifecycle-of-an-io-request" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/4e36321eb908.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;（https://blog.csdn.net/Hehuyi_In/article/details/100715177?spm=1001.2014.3001.5501）&lt;/p&gt;</content:encoded></item><item><title>PostgreSQL Localization</title><link>https://lastdba.com/en/2024/08/12/postgresql-localization/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/postgresql-localization/</guid><description>&lt;h2 class="relative group"&gt;Localization Concepts
 &lt;div id="localization-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#localization-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The purpose of localization is to support the language features and rules of different countries and regions. With localization support, you can use character sets that handle Chinese, French, Japanese, and more. Beyond character sets, there are also character sorting rules and other language-related rule support. For example, we know how to sort (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;b&amp;rsquo;), but how should (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;A&amp;rsquo;) and (&amp;lsquo;啊&amp;rsquo;, &amp;lsquo;阿&amp;rsquo;) be sorted?&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Localization Concepts
 &lt;div id="localization-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#localization-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The purpose of localization is to support the language features and rules of different countries and regions. With localization support, you can use character sets that handle Chinese, French, Japanese, and more. Beyond character sets, there are also character sorting rules and other language-related rule support. For example, we know how to sort (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;b&amp;rsquo;), but how should (&amp;lsquo;a&amp;rsquo;, &amp;lsquo;A&amp;rsquo;) and (&amp;lsquo;啊&amp;rsquo;, &amp;lsquo;阿&amp;rsquo;) be sorted?&lt;/p&gt;
&lt;p&gt;If you search Google for information about localization, character sets, and collation, you might end up with knowledge that feels both complex and distant. The best teacher is still 


&lt;img src="https://lastdba.com/img/csdn/4a8579e2070f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Localization knowledge is divided into three parts: locale support, collation, and character sets.&lt;/p&gt;

&lt;h2 class="relative group"&gt;locale
 &lt;div id="locale" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#locale" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL&amp;rsquo;s localization is provided by the operating system. You need to check whether the OS supports it via &lt;code&gt;locale -a&lt;/code&gt;. The locale can be specified when initializing the database:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can also set localization subcategories individually: string sort order, character classification, numeric formatting, date formatting, time formatting, currency formatting, etc.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;zh_CN --lc-monetary&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;All localization subcategories:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Subcategory&lt;/th&gt;
 &lt;th&gt;Rule&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_COLLATE&lt;/td&gt;
 &lt;td&gt;String sort order&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_CTYPE&lt;/td&gt;
 &lt;td&gt;Character classification (What is a letter? Its upper-case equivalent?)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_MESSAGES&lt;/td&gt;
 &lt;td&gt;Language of messages&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_MONETARY&lt;/td&gt;
 &lt;td&gt;Formatting of currency amounts&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_NUMERIC&lt;/td&gt;
 &lt;td&gt;Formatting of numbers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;LC_TIME&lt;/td&gt;
 &lt;td&gt;Formatting of dates and times&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;These subcategories can be split into two groups. &lt;code&gt;lc_messages&lt;/code&gt;, &lt;code&gt;lc_monetary&lt;/code&gt;, &lt;code&gt;lc_numeric&lt;/code&gt;, and &lt;code&gt;lc_time&lt;/code&gt; can be adjusted via parameters after initialization. &lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; belong to collation — see the collation section for adjustment details.&lt;/p&gt;
&lt;p&gt;Locale settings affect the following behaviors:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Sort order in queries using &lt;code&gt;ORDER BY&lt;/code&gt; or the standard comparison operators on textual data&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;upper&lt;/code&gt;, &lt;code&gt;lower&lt;/code&gt;, and &lt;code&gt;initcap&lt;/code&gt; functions&lt;/li&gt;
&lt;li&gt;Pattern matching operators (&lt;code&gt;LIKE&lt;/code&gt;, &lt;code&gt;SIMILAR TO&lt;/code&gt;, and POSIX-style regular expressions); locales affect both case insensitive matching and the classification of characters by character-class regular expressions&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;to_char&lt;/code&gt; family of functions&lt;/li&gt;
&lt;li&gt;The ability to use indexes with &lt;code&gt;LIKE&lt;/code&gt; clauses&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;COLLATION
 &lt;div id="collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Collation defines the sort order of characters and character classification behavior. Some database operators depend on collation, such as &lt;code&gt;ORDER BY&lt;/code&gt;, &lt;code&gt;lower&lt;/code&gt;, &lt;code&gt;upper&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt;, &lt;code&gt;to_char&lt;/code&gt;, and others.&lt;/p&gt;
&lt;p&gt;Use the following SQL to query the system table &lt;code&gt;pg_collation&lt;/code&gt; to get &lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; information for supported character sets:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_encoding_to_char(collencoding) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;,collname,collcollate,collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;default&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;C&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;POSIX&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;en_US.utf8&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.utf8&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;zh_SG.gb2312&amp;#39;&lt;/span&gt;) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------------+--------------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; POSIX &lt;span style="color:#f92672"&gt;|&lt;/span&gt; POSIX &lt;span style="color:#f92672"&gt;|&lt;/span&gt; POSIX
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_SG.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_SG.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_SG.gb2312&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;encoding&lt;/code&gt; is the character set, and &lt;code&gt;collname&lt;/code&gt; is the collation name.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When &lt;code&gt;encoding&lt;/code&gt; is empty, it means this collation supports all character sets.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;default&lt;/code&gt;, &lt;code&gt;C&lt;/code&gt;, &lt;code&gt;POSIX&lt;/code&gt; are collations supported on all platforms, provided by &lt;code&gt;libc&lt;/code&gt;. Other collations depend on whether the operating system supports them (&lt;code&gt;locale -a&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;default&lt;/code&gt; means using the collation set at database creation time, which can be viewed via &lt;code&gt;\l&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;C&lt;/code&gt; is semantically equivalent to &lt;code&gt;POSIX&lt;/code&gt;, but PostgreSQL still considers them different collations. They both compare characters by ASCII code, strictly by byte order.&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;POSIX&amp;#34;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;P21: &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt; mismatch &lt;span style="color:#66d9ef"&gt;between&lt;/span&gt; explicit collations &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;POSIX&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;LINE &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;POSIX&amp;#34;&lt;/span&gt; ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: merge_collation_state, parse_collate.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;834&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;UTF8 is the most common character set, and the most common language environments are &lt;code&gt;en_US&lt;/code&gt; and &lt;code&gt;zh_CN&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;You can create custom collations via &lt;code&gt;CREATE COLLATION ...&lt;/code&gt;. However, cases where &lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; differ are very rare.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;LC_COLLATE
 &lt;div id="lc_collate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lc_collate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;LC_COLLATE&lt;/code&gt; affects character comparison (sorting, character operations, etc.).&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;COLLATE&lt;/code&gt; clause can transform the collation of an expression:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;expr &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note that this specifies a &lt;em&gt;collation&lt;/em&gt;, not &lt;code&gt;lc_collate&lt;/code&gt;. If no collation is explicitly specified, the database uses the column&amp;rsquo;s collation by default. If the column has no collation specified, it uses the database&amp;rsquo;s default collation.&lt;/p&gt;
&lt;p&gt;Sorting test with different collations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;), (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;)) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; l(col1)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;These three different collations have different &lt;code&gt;lc_collate&lt;/code&gt; values, and the sort methods are indeed different — we can see three distinct sort results from the output.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Why does collation C put &amp;lsquo;A&amp;rsquo; before &amp;lsquo;a&amp;rsquo;?&lt;/strong&gt;
Collation C uses ASCII encoding order. In ASCII, uppercase letters come before lowercase. Meanwhile, &lt;code&gt;en_US.utf8&lt;/code&gt; and &lt;code&gt;zh_CN.utf8&lt;/code&gt; clearly do not follow this order for English letters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Order of Chinese characters&lt;/strong&gt;
Even with the same UTF8 character set, the order of Chinese characters differs between Chinese and English locales. Different &lt;code&gt;lc_collate&lt;/code&gt; values correspond to different alphabets for different localized languages. The sort order with &lt;code&gt;lc_collate=C&lt;/code&gt; is always by byte order. Although ASCII does not include Chinese, C can still sort Chinese — (essentially) every Chinese character maps to a UTF8 encoding, and C sorts by byte order.&lt;/p&gt;

&lt;h3 class="relative group"&gt;LC_CTYPE
 &lt;div id="lc_ctype" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lc_ctype" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;LC_CTYPE&lt;/code&gt; affects character operations (such as &lt;code&gt;upper&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt;, etc.).&lt;/p&gt;
&lt;p&gt;If the string is all English, e.g., &lt;code&gt;'abcD'&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt; converts it to &lt;code&gt;'Abcd'&lt;/code&gt; under all three collations — nothing special to show here.&lt;/p&gt;
&lt;p&gt;But when Chinese is introduced, the results differ:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; initcap(&lt;span style="color:#e6db74"&gt;&amp;#39;啊aAAa阿bBBb&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initcap 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;Aaaa阿Bbbb
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; initcap(&lt;span style="color:#e6db74"&gt;&amp;#39;啊aAAa阿aAAa&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initcap 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;aaaa阿aaaa
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; initcap(&lt;span style="color:#e6db74"&gt;&amp;#39;啊aAAa阿aAAa&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; initcap 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;啊&lt;/span&gt;aaaa阿aaaa&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;LC_CTYPE=C&lt;/code&gt;, &lt;code&gt;initcap&lt;/code&gt; capitalizes the first letter of every non-contiguous English character sequence, whereas &lt;code&gt;en_US.utf8&lt;/code&gt; and &lt;code&gt;zh_CN.utf8&lt;/code&gt; only capitalize the very first character (Chinese characters remain unchanged) and lowercase other English characters.&lt;/p&gt;
&lt;p&gt;The behavior of &lt;code&gt;initcap&lt;/code&gt; with Chinese may be an undefined requirement, but we can conclude: &lt;strong&gt;different &lt;code&gt;LC_CTYPE&lt;/code&gt; settings lead to different results from character-sensitive functions like &lt;code&gt;initcap&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Furthermore, Chinese is case-insensitive, but some other localized languages do have case distinctions — different &lt;code&gt;LC_CTYPE&lt;/code&gt; settings lead to even more complex outcomes.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Character Sets
 &lt;div id="character-sets" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#character-sets" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Character Set Basics
 &lt;div id="character-set-basics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#character-set-basics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL supports different character sets (also called encodings). Character sets and collation are two separate concepts, but the character set must be compatible with &lt;code&gt;LC_CTYPE&lt;/code&gt; and &lt;code&gt;LC_COLLATE&lt;/code&gt;. As seen in &lt;code&gt;pg_collation&lt;/code&gt;, C/POSIX support all character sets, while other collations only support one character set (on Linux systems).&lt;/p&gt;
&lt;p&gt;Chinese-related character sets available in PostgreSQL:
*(&lt;em&gt;The C collation is provided by the libc library; some collations can be provided by the ICU library, requiring compilation in advance.)&lt;/em&gt;&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Name&lt;/th&gt;
 &lt;th&gt;Description&lt;/th&gt;
 &lt;th&gt;Language&lt;/th&gt;
 &lt;th&gt;Server-side support?&lt;/th&gt;
 &lt;th&gt;ICU support?&lt;/th&gt;
 &lt;th&gt;Bytes/Char&lt;/th&gt;
 &lt;th&gt;Aliases&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;td&gt;Big Five&lt;/td&gt;
 &lt;td&gt;Traditional Chinese&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;1–2&lt;/td&gt;
 &lt;td&gt;WIN950, Windows950&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;EUC_CN&lt;/td&gt;
 &lt;td&gt;Extended UNIX Code-CN&lt;/td&gt;
 &lt;td&gt;Simplified Chinese&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;1–3&lt;/td&gt;
 &lt;td&gt;GB2312&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;td&gt;National Standard&lt;/td&gt;
 &lt;td&gt;Chinese&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;1–4&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;td&gt;Extended National Standard&lt;/td&gt;
 &lt;td&gt;Simplified Chinese&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;No&lt;/td&gt;
 &lt;td&gt;1–2&lt;/td&gt;
 &lt;td&gt;WIN936, Windows936&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;Unicode, 8-bit&lt;/td&gt;
 &lt;td&gt;all&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;Yes&lt;/td&gt;
 &lt;td&gt;1–4&lt;/td&gt;
 &lt;td&gt;Unicode&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Traditional Chinese&lt;/strong&gt;:
&lt;a href="https://baike.baidu.com/item/%E5%A4%A7%E4%BA%94%E7%A0%81/2413431?fr=ge_ala" target="_blank" rel="noreferrer"&gt;BIG5&lt;/a&gt; is the most common character set standard for Traditional Chinese. It was once the industry standard and was later incorporated as a national standard.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simplified Chinese&lt;/strong&gt;:
GB stands for &amp;ldquo;Guobiao&amp;rdquo; (national standard). GB2312, GB18030, and GBK are all Chinese national character set standards. Due to issues such as rare characters and years of development producing several historical versions, there appear to be multiple standards.
&lt;a href="https://baike.baidu.com/item/EUC-CN/4514294?fr=ge_ala" target="_blank" rel="noreferrer"&gt;EUC_CN&lt;/a&gt; stands for Extended UNIX Code-CN, which is essentially &lt;a href="https://baike.baidu.com/item/%E4%BF%A1%E6%81%AF%E4%BA%A4%E6%8D%A2%E7%94%A8%E6%B1%89%E5%AD%97%E7%BC%96%E7%A0%81%E5%AD%97%E7%AC%A6%E9%9B%86/8074272?fromModule=lemma_inlink&amp;amp;fromtitle=GB2312&amp;amp;fromid=483170" target="_blank" rel="noreferrer"&gt;GB2312&lt;/a&gt;, but it cannot handle all rare characters either. Similarly named encodings include EUC_KR, EUC_JP, EUC_TW, and so on.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;International Standards&lt;/strong&gt;:
The character sets above are all national standards — they support English and Chinese but not other languages. The international standard that supports all languages of the world is &lt;a href="https://home.unicode.org/" target="_blank" rel="noreferrer"&gt;Unicode&lt;/a&gt; (which even includes emoji &amp;#x1f44d;). (There is also the well-known international standards organization ISO, which maintains character sets as well — there is some overlap, but we&amp;rsquo;ll set ISO aside for now.)&lt;/p&gt;
&lt;p&gt;Due to different Unicode encoding schemes, there are three encoding formats: UTF-8, UTF-16, and UTF-32.&lt;/p&gt;
&lt;p&gt;UTF-8 encoding format:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Bytes&lt;/th&gt;
 &lt;th&gt;Format&lt;/th&gt;
 &lt;th&gt;Actual encoding bits&lt;/th&gt;
 &lt;th&gt;Code point range&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1 byte&lt;/td&gt;
 &lt;td&gt;0xxxxxxx&lt;/td&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;0 ~ 127&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2 byte&lt;/td&gt;
 &lt;td&gt;110xxxxx 10xxxxxx&lt;/td&gt;
 &lt;td&gt;11&lt;/td&gt;
 &lt;td&gt;128 ~ 2047&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3 byte&lt;/td&gt;
 &lt;td&gt;1110xxxx 10xxxxxx 10xxxxxx&lt;/td&gt;
 &lt;td&gt;16&lt;/td&gt;
 &lt;td&gt;2048 ~ 65535&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4 byte&lt;/td&gt;
 &lt;td&gt;11110xxx 10xxxxxx 10xxxxxx 10xxxxxx&lt;/td&gt;
 &lt;td&gt;21&lt;/td&gt;
 &lt;td&gt;65536 ~ 2097151&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;UTF8 encoding is variable-length.
For characters in the range 0x00-0x7F (1 byte), UTF-8 encoding is exactly identical to ASCII (American Standard Code for Information Interchange). Therefore, UTF-8 is fully backward-compatible with ASCII.&lt;/p&gt;
&lt;p&gt;Due to shared origins, meanings, and similarities, Chinese, Japanese, Korean, and Vietnamese characters use a unified encoding in Unicode called &lt;a href="https://baike.baidu.com/item/%E4%B8%AD%E6%97%A5%E9%9F%A9%E8%B6%8A%E7%BB%9F%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97/1301611?fromModule=lemma_inlink" target="_blank" rel="noreferrer"&gt;CJK Unified Ideographs (CJKV Unified Ideographs)&lt;/a&gt;.
CJK Unified Ideographs encoding ranges: 3400-4DBF/4E00-9FFF/20000-3FFFF.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f914ea2ca52f.png" alt="" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Character Set Conversion
 &lt;div id="character-set-conversion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#character-set-conversion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When &lt;code&gt;server_encoding&lt;/code&gt; and &lt;code&gt;client_encoding&lt;/code&gt; differ, automatic conversion of the character set returned by the server can occur. For setting server-side and client-side character sets, see the &amp;ldquo;Configuring Character Sets&amp;rdquo; section.&lt;/p&gt;
&lt;p&gt;Chinese-related character sets — Server/Client convertible table:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Server Character Set&lt;/th&gt;
 &lt;th&gt;Available Client Character Sets&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;not supported as a server encoding&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN (GB2312)&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN (GB2312), &lt;code&gt;MULE_INTERNAL&lt;/code&gt;, &lt;code&gt;UTF8&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;not supported as a server encoding&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;td&gt;&lt;em&gt;not supported as a server encoding&lt;/em&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;UTF8&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;all supported encodings&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;GB18030 and GBK are not supported on the server side, so in practice only EUC_CN (GB2312) and UTF8 can perform Server/Client conversion.&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The above lists the character sets that &lt;em&gt;can&lt;/em&gt; be converted, but conversion still requires CONVERSION support. PostgreSQL has built-in conversion functions visible via &lt;code&gt;pg_conversion&lt;/code&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Conversion Name&lt;/th&gt;
 &lt;th&gt;Source Encoding&lt;/th&gt;
 &lt;th&gt;Destination Encoding&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;big5_to_utf8&lt;/td&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;euc_cn_to_utf8&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;UTF8&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gb18030_to_utf8&lt;/td&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;gbk_to_utf8&lt;/td&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;utf8_to_big5&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;BIG5&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;strong&gt;utf8_to_euc_cn&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;UTF8&lt;/strong&gt;&lt;/td&gt;
 &lt;td&gt;&lt;strong&gt;EUC_CN&lt;/strong&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;utf8_to_gb18030&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;GB18030&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;utf8_to_gbk&lt;/td&gt;
 &lt;td&gt;UTF8&lt;/td&gt;
 &lt;td&gt;GBK&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;You can create custom conversions via the &lt;code&gt;CREATE CONVERSION&lt;/code&gt; statement, specifying the conversion function.&lt;/p&gt;
&lt;p&gt;Some character sets appear to be interconvertible, but the server side doesn&amp;rsquo;t support storing them at all (such as BIG5, GB18030, GBK), so it&amp;rsquo;s not practically useful. All we need to know here is that &lt;code&gt;euc_cn&lt;/code&gt; and &lt;code&gt;utf8&lt;/code&gt; can be converted to/from each other.&lt;/p&gt;
&lt;p&gt;Without CONVERSION support, conversion cannot happen:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- EUC_CN database
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; EUC_KR
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;EUC_KR: invalid &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;conversion&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;procedure&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;found&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Character set conversion test&lt;/strong&gt;:
&lt;em&gt;Pay attention to the client-side character set settings (e.g., CRT&amp;rsquo;s &amp;ldquo;session&amp;rdquo; - &amp;ldquo;Appearance&amp;rdquo; - &amp;ldquo;Character encoding&amp;rdquo;)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;There are at least three endpoints with character set concepts: database server, database client, and UI client. CONVERSION only controls: database server → database client.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Server with UTF8 conversion test:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; zh(col1 varchar(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;&amp;gt;&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;); &lt;span style="color:#75715e"&gt;-- 〇 (líng) is a Chinese character
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- If CRT is not set to UTF8, Chinese characters are all garbled; only set CRT to UTF8 for insertion
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; server_encoding;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; server_encoding 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; client_encoding;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; client_encoding 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With no conversion at all, UTF8 displays correctly. Currently three endpoints: UTF8 - UTF8 - UTF8
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;阿&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;〇&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Switch database client character set. Now three endpoints: UTF8 - EUC_CN - UTF8
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; EUC_CN; &lt;span style="color:#75715e"&gt;-- Set client character set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22021&lt;/span&gt;: invalid byte sequence &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe9 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x98
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: report_invalid_encoding, mbutils.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1597&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;112&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22021&lt;/span&gt;: invalid byte sequence &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22021&lt;/span&gt;: invalid byte sequence &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- It looks like &amp;#34;阿&amp;#34; and &amp;#34;〇&amp;#34; cannot be converted to EUC_CN, but that&amp;#39;s not the whole story
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;B0&lt;span style="color:#f92672"&gt;&amp;gt;&amp;lt;&lt;/span&gt;A2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The second row is &amp;#34;阿&amp;#34;. The database server/client appears to have converted the character set from UTF8 to EUC_CN.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- However, it may not display correctly due to UI client issues (currently CRT is set to UTF8)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Even changing CRT to GB2312 still won&amp;#39;t display correctly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;B0&lt;span style="color:#f92672"&gt;&amp;gt;&amp;lt;&lt;/span&gt;A2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When querying 〇, the database throws an error directly, indicating 〇 cannot be converted from UTF8 to EUC_CN
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; zh ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;P05: character &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; byte sequence &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x87 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;UTF8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; equivalent &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: report_untranslatable_char, mbutils.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1631&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Server with EUC_CN conversion test:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; server_encoding; &lt;span style="color:#75715e"&gt;-- Database has EUC_CN character set
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; server_encoding 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create the same zh table under the EUC_CN database, but inserting already has issues
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; zh &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;P05: character &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; byte sequence &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;xe3 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x80 &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;x87 &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;UTF8&amp;#34;&lt;/span&gt; has &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; equivalent &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: report_untranslatable_char, mbutils.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;1631&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Again, the error says 〇 cannot be converted from UTF8 to EUC_CN. EUC_CN (GB2312) Chinese encoding is not fully identical to UTF8 — EUC_CN (GB2312) does not include all Chinese characters, especially rare ones.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Configuring locale, collation, and character set
 &lt;div id="configuring-locale-collation-and-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#configuring-locale-collation-and-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Now that we&amp;rsquo;ve covered localization and character sets, here&amp;rsquo;s a summary.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Database cluster locale, collation, character set
 &lt;div id="database-cluster-locale-collation-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#database-cluster-locale-collation-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;At initialization time, you can set the database cluster&amp;rsquo;s locale and character set:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb -D $DATADIR -E UTF8 --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb -D $DATADIR -E UTF8 --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc_collate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C --lc_ctype&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;initdb -D $DATADIR -E UTF8 --locale&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc_collate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C --lc_ctype&lt;span style="color:#f92672"&gt;=&lt;/span&gt;C --lc-messages&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc-monetary&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc-numeric&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8 --lc-time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;en_US.UTF8&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;initdb&lt;/code&gt; creates three databases: &lt;code&gt;postgres&lt;/code&gt;, &lt;code&gt;template1&lt;/code&gt;, and &lt;code&gt;template0&lt;/code&gt;. The &lt;code&gt;CREATE DATABASE&lt;/code&gt; statement defaults to using &lt;code&gt;template1&lt;/code&gt; to create databases.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;encoding&lt;/code&gt; sets the character set; &lt;code&gt;locale&lt;/code&gt; sets &lt;code&gt;LC_COLLATE&lt;/code&gt;, &lt;code&gt;LC_CTYPE&lt;/code&gt;, &lt;code&gt;LC_MESSAGES&lt;/code&gt;, &lt;code&gt;LC_MONETARY&lt;/code&gt;, &lt;code&gt;LC_NUMERIC&lt;/code&gt;, and &lt;code&gt;LC_TIME&lt;/code&gt;, unless specifically overridden (e.g., via &lt;code&gt;--lc_collate&lt;/code&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;LC_COLLATE&lt;/code&gt; and &lt;code&gt;LC_CTYPE&lt;/code&gt; are called collation and can also be set at the database, column, and index levels. &lt;code&gt;LC_MESSAGES&lt;/code&gt;, &lt;code&gt;LC_MONETARY&lt;/code&gt;, &lt;code&gt;LC_NUMERIC&lt;/code&gt;, and &lt;code&gt;LC_TIME&lt;/code&gt; are instance parameters that can be changed at any time.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;encoding&lt;/code&gt; can only be set at initialization or at database creation — once set, it cannot be changed.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Database collation and character set
 &lt;div id="database-collation-and-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#database-collation-and-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When creating a database, you can set the database&amp;rsquo;s character set, &lt;code&gt;lc_collate&lt;/code&gt;, and &lt;code&gt;lc_ctype&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Both &lt;code&gt;CREATE DATABASE&lt;/code&gt; and &lt;code&gt;createdb&lt;/code&gt; can specify the character set at database creation time. Once created, the database character set cannot be changed. Both commands use a template database to create the new database.&lt;/p&gt;
&lt;p&gt;There are two templates: &lt;code&gt;template0&lt;/code&gt; and &lt;code&gt;template1&lt;/code&gt;. The official documentation states:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Another common reason for copying &lt;code&gt;template0&lt;/code&gt; instead of &lt;code&gt;template1&lt;/code&gt; is that new encoding and locale settings can be specified when copying &lt;code&gt;template0&lt;/code&gt;, whereas a copy of &lt;code&gt;template1&lt;/code&gt; must use the same settings it does. This is because &lt;code&gt;template1&lt;/code&gt; might contain encoding-specific or locale-specific data, while &lt;code&gt;template0&lt;/code&gt; is known not to.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;code&gt;template1&lt;/code&gt; is a writable template database that may contain localized data, while &lt;code&gt;template0&lt;/code&gt; cannot be written to. Therefore, to create a database with different localization settings, you should use &lt;code&gt;template0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;And you must explicitly use &lt;code&gt;template0&lt;/code&gt;, because the default is &lt;code&gt;template1&lt;/code&gt;. Attempting to create a database without specifying &lt;code&gt;template1&lt;/code&gt; and with a different character set will result in an error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 &lt;span style="color:#66d9ef"&gt;ENCODING&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;EUC_CN&amp;#39;&lt;/span&gt; LC_COLLATE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; LC_CTYPE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22023&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;new&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; (EUC_CN) &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; incompatible &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; (UTF8)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;HINT: Use the same &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; the &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;, &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; use template0 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, you cannot set the character set by specifying &lt;code&gt;locale&lt;/code&gt; when creating a database:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 locale &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;template0&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;22023&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;UTF8&amp;#34;&lt;/span&gt; does &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;match&lt;/span&gt; locale &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.gb2312&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: The chosen LC_CTYPE setting requires &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;EUC_CN&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: check_encoding_locale_matches, dbcommands.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;773&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The error indicates you need to specify the &lt;code&gt;LC_CTYPE&lt;/code&gt; sub-option. Adding all collation-related sub-options still produces an error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 LOCALE &lt;span style="color:#e6db74"&gt;&amp;#39;EUC_CN&amp;#39;&lt;/span&gt; LC_COLLATE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; LC_CTYPE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42601&lt;/span&gt;: conflicting &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; redundant &lt;span style="color:#66d9ef"&gt;options&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: LOCALE cannot be specified together &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; LC_COLLATE &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; LC_CTYPE.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;LOCALE&lt;/code&gt; cannot be used together with &lt;code&gt;LC_CTYPE&lt;/code&gt; and other sub-options.&lt;/p&gt;
&lt;p&gt;Removing &lt;code&gt;locale&lt;/code&gt; and setting via character set, &lt;code&gt;LC_COLLATE&lt;/code&gt;, and &lt;code&gt;LC_CTYPE&lt;/code&gt; works successfully.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The correct way to create a database with a specific character set&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CREATE DATABASE&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_GB2312 &lt;span style="color:#66d9ef"&gt;ENCODING&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;EUC_CN&amp;#39;&lt;/span&gt; LC_COLLATE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; LC_CTYPE &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN.gb2312&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;template0&amp;#39;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;createdb&lt;/code&gt;:
Use the CLI command &lt;code&gt;createdb&lt;/code&gt;, which wraps &lt;code&gt;CREATE DATABASE&lt;/code&gt; — they are equivalent:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-shell" data-lang="shell"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; createdb -E EUC_CN -T template0 --lc-collate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;zh_CN.gb2312 --lc-ctype&lt;span style="color:#f92672"&gt;=&lt;/span&gt;zh_CN.gb2312 db_GB2312&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Viewing database character set:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;\l&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;pg_database&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; datname,pg_encoding_to_char(&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;),datcollate,datctype,datlocprovider,daticulocale &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_database;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;&lt;code&gt;SHOW&lt;/code&gt; parameters&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;code&gt;SERVER_ENCODING&lt;/code&gt;, &lt;code&gt;LC_COLLATE&lt;/code&gt;, and &lt;code&gt;LC_CTYPE&lt;/code&gt; are all immutable parameters that display the &lt;em&gt;current&lt;/em&gt; database&amp;rsquo;s server-side character set, &lt;code&gt;LC_COLLATE&lt;/code&gt;, and &lt;code&gt;LC_CTYPE&lt;/code&gt;, respectively.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Column collation
 &lt;div id="column-collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#column-collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Collation is only related to character sorting and character functions — it is not related to encoding. Without indexes, changing a column&amp;rsquo;s collation is essentially just adjusting the default sort output for that column. With indexes, it will rebuild the index. If no collation is specified for a column, it defaults to the database&amp;rsquo;s collation.&lt;/p&gt;
&lt;p&gt;Specifying collation when creating a table (note: some data types are un-collatable, such as &lt;code&gt;int&lt;/code&gt;):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1(col1 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col1 &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Note: &lt;code&gt;ALTER TABLE&lt;/code&gt; without changing the length will not rewrite the table, but it will definitely rebuild the index.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Viewing a column&amp;rsquo;s default collation&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;. &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; t1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;. information_schema.columns
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; table_catalog,table_schema,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;column_name&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;collation_name&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.columns &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;t1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;. pg_attribute
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a.attrelid::regclass,a.attname,a.attcollation,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collname,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collcollate,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute a &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a.attrelid::regclass&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Method 3 is recommended. While &lt;code&gt;\d+&lt;/code&gt; and &lt;code&gt;information_schema.columns&lt;/code&gt; can show &lt;code&gt;collname&lt;/code&gt;, &lt;code&gt;collname&lt;/code&gt; is not unique. Only method 3 reveals &lt;code&gt;collate&lt;/code&gt; and &lt;code&gt;ctype&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test: specifying collate and viewing &lt;code&gt;pg_attribute&lt;/code&gt;:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) ,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col2 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col3 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col4 varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Column collation is like tagging the column with a default sort order; you can&amp;#39;t see the specific collate and ctype
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tlzl&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------+------------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- collname and collate/ctype are not one-to-one; col3&amp;#39;s zh_CN alone doesn&amp;#39;t reveal which collate is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_encoding_to_char(collencoding) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;,collname,collcollate,collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; collname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;zh_CN%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+--------------+--------------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; EUC_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.gb2312
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; UTF8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- pg_attribute shows more precisely than \d+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a.attrelid::regclass,a.attname,a.attcollation,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collname,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collcollate,&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_attribute a &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.oid &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a.attrelid::regclass&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;::regclass &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; a.attcollation&lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; attrelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; attcollation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+--------------+------------+-------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;12562&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; col3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13200&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN.utf8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Now we know that col3 zh_CN&amp;#39;s collate is zh_CN.utf8 &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Test: table rewrite when modifying column collate:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add an index to the column and check rewrite behavior
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxcol4 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(col4);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41006&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41015&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col4 &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41006&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41016&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Table was not rewritten; index was rewritten&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A column&amp;rsquo;s collation is merely a marker. Modifying the column&amp;rsquo;s collation does not rewrite the table, but if there is an index on it, the index will be rewritten (sometimes not — see the next section).&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index collation
 &lt;div id="index-collation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-collation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When creating an index, if the index&amp;rsquo;s collation is not explicitly specified, the index uses the collation declared on the column.&lt;/p&gt;
&lt;p&gt;Explicitly specifying collation when creating an index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_C &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tlzl(col3 &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;); &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, indexes can be created with &lt;code&gt;text_pattern_ops&lt;/code&gt;, &lt;code&gt;varchar_pattern_ops&lt;/code&gt;, &lt;code&gt;bpchar_pattern_ops&lt;/code&gt; — in this case, the index does not depend on collation rules but compares character by character:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The difference from the default operator classes is that the values are compared strictly character by character rather than according to the locale-specific collation rules.&lt;/p&gt;
&lt;/blockquote&gt;&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; test_index &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; test_table (col varchar_pattern_ops);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In fact, this type of index is not entirely unrelated to collation — an index always has a sort order. This type of index&amp;rsquo;s sort order appears to be consistent with &lt;code&gt;C&lt;/code&gt;. See the &amp;ldquo;LIKE not using index&amp;rdquo; section.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Viewing an index&amp;rsquo;s collation:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- \d+ shows indexes with explicitly specified collate; if not specified, the column&amp;#39;s default collation is used
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; tlzl
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.tlzl&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------+------------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zh_CN &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; col4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; en_US.utf8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_c&amp;#34;&lt;/span&gt; btree (col3 &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idxcol4&amp;#34;&lt;/span&gt; btree (col4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Viewing via &lt;code&gt;pg_index&lt;/code&gt; is clearer (the &lt;code&gt;indcollation&lt;/code&gt; type in &lt;code&gt;pg_index&lt;/code&gt; is &lt;code&gt;oidvector&lt;/code&gt; and cannot be directly cast to &lt;code&gt;oid&lt;/code&gt;, making queries a bit cumbersome):&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; indcollation,indexrelid::regclass &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_index &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; indexrelid::regclass &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;idx_C&amp;#39;&lt;/span&gt;::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; indcollation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; idx_c
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; oid,pg_encoding_to_char(collencoding) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt;,collname,collcollate,collctype &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_collation &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; oid&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;950&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; oid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collcollate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; collctype 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----+----------+----------+-------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;950&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;C&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Also, you cannot change an index&amp;rsquo;s collation via &lt;code&gt;ALTER INDEX&lt;/code&gt; — you must drop and recreate it.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Test: After specifying an index collate, does modifying the column&amp;rsquo;s collate rewrite the index?&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid4,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_c&amp;#39;&lt;/span&gt;) IndexRelidC; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelidc 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41020&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41023&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41024&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tlzl &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; col3 &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;tlzl&amp;#39;&lt;/span&gt;) TableRelid, pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idxcol4&amp;#39;&lt;/span&gt;) IndexRelid4,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx_c&amp;#39;&lt;/span&gt;) IndexRelidC; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tablerelid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelid4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; indexrelidc 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------+------------------+------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41020&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41023&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40996&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41024&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- idx_c&amp;#39;s relfileid did not change&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If an index&amp;rsquo;s collate has been explicitly specified, modifying the column&amp;rsquo;s default collate will not rewrite that index.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Client character set
 &lt;div id="client-character-set" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#client-character-set" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When the client sets a character set different from the database, character set conversion occurs — though conversion may not always succeed. See the &amp;ldquo;Character Set Conversion&amp;rdquo; section for details.&lt;/p&gt;
&lt;p&gt;The server-side character set cannot be changed after database creation, but the client character set can be adjusted at any time.&lt;/p&gt;
&lt;p&gt;There are many ways to set the client character set:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Set directly on the client:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; UTF8 &lt;span style="color:#75715e"&gt;-- psql only
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; CLIENT_ENCODING &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; UTF8; &lt;span style="color:#75715e"&gt;-- session-level parameter change
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NAMES&lt;/span&gt; UTF8; &lt;span style="color:#75715e"&gt;-- SQL standard&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Set the &lt;code&gt;PGCLIENTENCODING&lt;/code&gt; environment variable&lt;/li&gt;
&lt;li&gt;Set the &lt;code&gt;client_encoding&lt;/code&gt; server configuration parameter&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Priority: client-side setting &amp;gt; &lt;code&gt;PGCLIENTENCODING&lt;/code&gt; environment variable &amp;gt; &lt;code&gt;client_encoding&lt;/code&gt; server configuration parameter&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Viewing the client character set:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;encoding&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- psql only
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SHOW&lt;/span&gt; client_encoding;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Expression collate
 &lt;div id="expression-collate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#expression-collate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Adding &lt;code&gt;COLLATE&lt;/code&gt; to an expression overrides the expression&amp;rsquo;s original collation, effectively specifying a sort collation.&lt;/p&gt;
&lt;p&gt;Add the &lt;code&gt;COLLATE&lt;/code&gt; keyword at the end of the expression:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;expr &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;collation&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- For example
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tab1 &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;COLLATE&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For details on sorting and collate index selection, see the &amp;ldquo;Sort Result Issues&amp;rdquo; section.&lt;/p&gt;

&lt;h2 class="relative group"&gt;MORE
 &lt;div id="more" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#more" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Concept Summary
 &lt;div id="concept-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#concept-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;PostgreSQL localization has three important concepts: character set, locale, and collation — it&amp;rsquo;s essential to understand their relationships.&lt;/p&gt;
&lt;p&gt;The server-side character set setting is very important: it can only be specified at initialization and database creation time, and cannot be modified after the database is created. The character set choice directly affects the encoding method. Collation does not, but there is a dependency between the two. Locale can likewise be specified at initialization, and among them, collation can be set at database creation time or individually on columns — note that these are merely defaults. Only when specifying collation at index creation does it affect the actual storage order. Different collations cannot use the same index, even if they share the same origin.&lt;/p&gt;
&lt;p&gt;Client character set and the four parameters (&lt;code&gt;LC_MESSAGES&lt;/code&gt;, etc.) are relatively simple — they can be modified directly via parameters and are unrelated to data storage.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6c422851b81d.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Sort Result Issues
 &lt;div id="sort-result-issues" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sort-result-issues" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since UTF8 is the most common character set, we&amp;rsquo;ll test sorting with UTF-related collations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; db_UTF8 &lt;span style="color:#66d9ef"&gt;ENCODING&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;UTF8&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;template&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;template0&amp;#39;&lt;/span&gt;; &lt;span style="color:#75715e"&gt;-- Create a UTF8 database; collation doesn&amp;#39;t matter
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;use db_UTF8;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; tzlz(name varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;),(&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ORDER BY results with different collations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;zh_CN.utf8&amp;#34;&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Order&lt;/th&gt;
 &lt;th&gt;default&lt;/th&gt;
 &lt;th&gt;C&lt;/th&gt;
 &lt;th&gt;en_US&lt;/th&gt;
 &lt;th&gt;en_US.utf8&lt;/th&gt;
 &lt;th&gt;zh_CN&lt;/th&gt;
 &lt;th&gt;zh_CN.utf8&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;1&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;2&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;3&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;a&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;A&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;4&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;aa&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;5&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;AA&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;6&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;td&gt;啊&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;7&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;阿&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;td&gt;〇&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;em&gt;Here, &lt;code&gt;default&lt;/code&gt; is &lt;code&gt;en_US.utf8&lt;/code&gt; (column collation(default) → database collation(en_US.utf8))&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&amp;#x1f31f; &lt;strong&gt;C, en_US.utf8, and zh_CN.utf8 all produce different sort results!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Collate and index scan test:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_default &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_C &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_enUS_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using collate for index optimization:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without any collate keyword, a simple index scan; no extra sorting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8_c&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_default &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Adding collate conversion to the predicate hits the correct index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_enus_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- However, the collation name must match exactly
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- ORDER BY also needs the collate conversion expression
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Here, the correct index is used, but ORDER BY treats them as different collations (even though they are the same)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;23&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_enus_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Adding collate conversion to both WHERE and ORDER BY selects the right index and avoids extra sorting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;aa&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;A&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;AA&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;啊&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;〇&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_enus_utf8 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (name &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ANY&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;{a,aa,A,AA,啊,阿,〇}&amp;#39;&lt;/span&gt;::text[]))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;After specifying a collation on an index, the SQL must explicitly use the COLLATE keyword to convert the expression. Even if the default is the same as the current collation, PostgreSQL will not use the index.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;LIKE not using index
 &lt;div id="like-not-using-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#like-not-using-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;blockquote&gt;&lt;p&gt;The drawback of using locales other than &lt;code&gt;C&lt;/code&gt; or &lt;code&gt;POSIX&lt;/code&gt; in PostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used by &lt;code&gt;LIKE&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;PostgreSQL&amp;rsquo;s own words: using non-C or non-POSIX prevents ordinary indexes from being used!&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;a%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;PostgreSQL converts &lt;code&gt;LIKE&lt;/code&gt; to &lt;code&gt;&amp;gt;=&lt;/code&gt; and &lt;code&gt;&amp;lt;&lt;/code&gt; during index scans, where &lt;code&gt;&amp;lt;&lt;/code&gt; adds a &amp;ldquo;one step greater&amp;rdquo; value. This is where the problem lies: collation is strongly tied to sorting order. In ASCII, &lt;code&gt;a+1&lt;/code&gt; is &lt;code&gt;b&lt;/code&gt;, but what about Chinese characters?&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idxzlz_c &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;陿&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Sure enough, another Chinese character appears!&lt;/p&gt;
&lt;p&gt;If it&amp;rsquo;s a sequential scan, the &lt;code&gt;&amp;gt;=&lt;/code&gt; and &lt;code&gt;&amp;lt;&lt;/code&gt; won&amp;rsquo;t appear:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_c;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DROP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;en_US.utf8&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;170&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You can create an index that is (claimed by the PostgreSQL docs to be) unrelated to collation rules:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_pattern &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; tzlz (name varchar_pattern_ops);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Let&amp;rsquo;s look at its execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;db_utf8&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; tzlz &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_pattern &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name &lt;span style="color:#f92672"&gt;~&amp;gt;=~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (name &lt;span style="color:#f92672"&gt;~&amp;lt;~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;陿&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((name)::text &lt;span style="color:#f92672"&gt;~~&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;阿%&amp;#39;&lt;/span&gt;::text)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It still auto-generates the &amp;ldquo;one greater&amp;rdquo; string — this is definitely related to collation. It appears to be using C.&lt;/p&gt;
&lt;p&gt;So we can conclude:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;When PostgreSQL uses a regular index for LIKE, it needs to convert it to &lt;code&gt;&amp;gt;=&lt;/code&gt; and &lt;code&gt;&amp;lt;&lt;/code&gt;, which requires a &amp;ldquo;one greater&amp;rdquo; value relative to the current string. Since collation is strongly tied to ordering, only an index using the same collation can guarantee data correctness. PostgreSQL chooses the non-localized C collation for this.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The quickest workaround is to create a C collation index or a pattern index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idxzlz_C &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; tzlz(name &lt;span style="color:#66d9ef"&gt;collate&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_pattern &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; tzlz (name varchar_pattern_ops);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For other adjustments to default collation at various levels, refer to the sections above.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Developers typically don&amp;rsquo;t specify collation when creating indexes. If it&amp;rsquo;s not C or pattern, LIKE won&amp;rsquo;t use the index. Combined with the common choice of the international character set UTF8, this leaves very few localization options in database operations. The recommended setup: character set UTF8, collation C.&lt;/em&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://dbafix.com/what-is-the-impact-of-lc_ctype-on-a-postgresql-database/#:~:text=Having%20LC_CTYPE%20set%20to%20%E2%80%98C%E2%80%99%20implies%20that%20C,Postgres%20on%20top%20of%20these%20libc%20functions%2C%20they%E2%80%99re" target="_blank" rel="noreferrer"&gt;https://dbafix.com/what-is-the-impact-of-lc_ctype-on-a-postgresql-database/#:~:text=Having%20LC_CTYPE%20set%20to%20%E2%80%98C%E2%80%99%20implies%20that%20C,Postgres%20on%20top%20of%20these%20libc%20functions%2C%20they%E2%80%99re&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/charset.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/charset.html&lt;/a&gt;
&lt;a href="https://www.bookstack.cn/read/rds-best-pratice/bfc0037fe00d87dc.md" target="_blank" rel="noreferrer"&gt;https://www.bookstack.cn/read/rds-best-pratice/bfc0037fe00d87dc.md&lt;/a&gt;
&lt;a href="https://help.aliyun.com/zh/rds/apsaradb-rds-for-postgresql/configure-the-collation-of-a-database-on-an-apsaradb-rds-for-postgresql-instance" target="_blank" rel="noreferrer"&gt;https://help.aliyun.com/zh/rds/apsaradb-rds-for-postgresql/configure-the-collation-of-a-database-on-an-apsaradb-rds-for-postgresql-instance&lt;/a&gt;
&lt;a href="https://baike.baidu.com/item/%E7%BB%9F%E4%B8%80%E7%A0%81/2985798?fromModule=lemma_inlink&amp;amp;fromtitle=Unicode&amp;amp;fromid=750500" target="_blank" rel="noreferrer"&gt;https://baike.baidu.com/item/%E7%BB%9F%E4%B8%80%E7%A0%81/2985798?fromModule=lemma_inlink&amp;fromtitle=Unicode&amp;fromid=750500&lt;/a&gt;
&lt;a href="https://baike.baidu.com/item/%E4%B8%AD%E6%97%A5%E9%9F%A9%E8%B6%8A%E7%BB%9F%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97/1301611?fromModule=lemma_inlink" target="_blank" rel="noreferrer"&gt;https://baike.baidu.com/item/%E4%B8%AD%E6%97%A5%E9%9F%A9%E8%B6%8A%E7%BB%9F%E4%B8%80%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97/1301611?fromModule=lemma_inlink&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://blog.csdn.net/songyundong1993/article/details/128739919" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/songyundong1993/article/details/128739919&lt;/a&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Original article (Chinese): &lt;a href="https://lastdba.com/2024/08/12/postgresql%E6%9C%AC%E5%9C%B0%E5%8C%96/" target="_blank" rel="noreferrer"&gt;PostgreSQL本地化&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;</content:encoded></item><item><title>PostgreSQL Table Partitioning Deep Dive</title><link>https://lastdba.com/en/2024/08/12/postgresql-table-partitioning-deep-dive/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/postgresql-table-partitioning-deep-dive/</guid><description>&lt;h2 class="relative group"&gt;What is a Partitioned Table
 &lt;div id="what-is-a-partitioned-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-partitioned-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/787a5ce076e9.png" alt="Postgres Table Partitioning" /&gt;
Database partitioning splits table data into smaller physical shards to improve performance, availability, and manageability. Partitioned tables are a common optimization technique for large tables in relational databases. DBMS generally provide partition management, and applications can access partitioned tables directly without changing their architecture—though good performance requires proper partition access patterns.&lt;/p&gt;
&lt;p&gt;Partitioned tables are common database technology, but PostgreSQL partitioned tables have many unique characteristics: multiple implementation approaches, partitions being regular tables, partition maintenance strategies, SQL optimization considerations, and some known issues.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;What is a Partitioned Table
 &lt;div id="what-is-a-partitioned-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-is-a-partitioned-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/787a5ce076e9.png" alt="Postgres Table Partitioning" /&gt;
Database partitioning splits table data into smaller physical shards to improve performance, availability, and manageability. Partitioned tables are a common optimization technique for large tables in relational databases. DBMS generally provide partition management, and applications can access partitioned tables directly without changing their architecture—though good performance requires proper partition access patterns.&lt;/p&gt;
&lt;p&gt;Partitioned tables are common database technology, but PostgreSQL partitioned tables have many unique characteristics: multiple implementation approaches, partitions being regular tables, partition maintenance strategies, SQL optimization considerations, and some known issues.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Partition Table Implementations
 &lt;div id="partition-table-implementations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-implementations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL provides various partition implementation approaches. The officially supported methods are declarative partitioning and inheritance partitioning, while third-party plugins include pg_pathman, pg_partman, etc. Since the introduction of official declarative partitioning, only one approach is generally recommended: declarative partitioning. Covering every implementation&amp;rsquo;s features, details, and history would make this article excessively long and is less relevant going forward. This article focuses mainly on declarative partitioning, with brief introductions to other approaches. However, due to existing deployments and feature differences, understanding declarative partitioning, inheritance partitioning, and pg_pathman remains valuable.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Declarative Partitioning
 &lt;div id="declarative-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#declarative-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Declarative partitioning, also called native partitioning, has been supported since PG10. It is the &amp;ldquo;officially supported&amp;rdquo; partitioning approach and the most recommended method. Although different from inheritance partitioning, declarative partitioning is also implemented internally using table inheritance. It supports only three partition methods: RANGE, LIST, and HASH.&lt;/p&gt;

&lt;h4 class="relative group"&gt;RANGE Partitioning
 &lt;div id="range-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#range-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f252d7be9e4d.png" alt="" /&gt;
RANGE partitioned tables split data by range, with partition boundaries defined as [t1, t2) (inclusive lower bound, exclusive upper bound).&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PUBLIC&lt;/span&gt;.LZLPARTITION1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id int,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name varchar(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; DATE_CREATED &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; now()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE(DATE_CREATED);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.lzlpartition1 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;(id,DATE_CREATED)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202301 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202302 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert some data into the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;, md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text),&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-28&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;1 minute&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;83521&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For RANGE partitioning, the FROM t1 TO t2 boundary uses the [t1, t2) convention: the lower bound is inclusive and the upper bound is exclusive.&lt;/p&gt;
&lt;p&gt;Inspecting the partitioned table shows that each partition is also an independent table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Compression &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Primary keys, indexes, and NOT NULL/CHECK constraints are automatically created on partitions. Since partitions are independent tables, constraints and indexes can also be created on individual partitions. (ATTACH does not automatically create these — see the ATTACH section for details.)&lt;/p&gt;

&lt;h4 class="relative group"&gt;LIST Partitioning
 &lt;div id="list-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#list-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e3d094556a5d.png" alt="" /&gt;
LIST partitioning stores data in the corresponding partition based on specified partition key values.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; cities (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; city_id bigserial &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name text,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; population bigint
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; LIST (&lt;span style="color:#66d9ef"&gt;left&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;lower&lt;/span&gt;(name), &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;));&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; cities_ab
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; cities &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; cities_null
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; cities (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; city_id_nonzero &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (city_id &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; cities(name,population) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;Acity&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; cities(name,population) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; cities;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; city_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; population 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+---------+--------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cities_ab &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Acity &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; cities_null &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;LIST partitioned tables support creating a NULL partition.&lt;/p&gt;

&lt;h4 class="relative group"&gt;HASH Partitioning
 &lt;div id="hash-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hash-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9e18df7edc15.png" alt="" /&gt;
HASH partitioning distributes data across partitions to spread out hot data evenly.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders (order_id int,name varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;)) PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; HASH (order_id);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p1 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p3 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You cannot create a default partition, nor can you create more partitions than the specified MODULUS.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;P16: remainder &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; hash partition must be &lt;span style="color:#66d9ef"&gt;less&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;than&lt;/span&gt; modulus
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: transformPartitionBound, parse_utilcmd.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3939&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p4 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42&lt;/span&gt;P16: a hash&lt;span style="color:#f92672"&gt;-&lt;/span&gt;partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; may &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; have a &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; partition
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: transformPartitionBound, parse_utilcmd.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3909&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Insert data:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;),&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3277&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3354&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3369&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; order_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+----------+------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;19&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; a&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;HASH partition data is distributed evenly across partitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert 100 NULL rows
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;)::text);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; order_id &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- All NULL data ends up on the remainder 0 partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; orders_p1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.orders_p1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+-----------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; order_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (modulus &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, remainder &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although HASH partitioned tables have no concept of a NULL partition, they can store NULL data. NULL values are placed on the remainder 0 partition.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Multi-level (Mixed) Partitioning
 &lt;div id="multi-level-mixed-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#multi-level-mixed-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Partitions can themselves be further partitioned, forming a cascading structure. Sub-partitions can use different partition methods — this is called mixed partitioning.



&lt;img src="https://lastdba.com/img/csdn/220e4e6f1544.png" alt="" /&gt;
Creating a mixed partition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_1000(id bigserial &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,name varchar(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;),createddate &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; range(createddate);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_2001 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; list(name) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_2002 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; list(name) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_2003 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) partition &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; list(name) ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_3001 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_3002 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;def&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; part_3003 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;jkl&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;\d+ only shows the immediate next-level partitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; part_1000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr.part_1000&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----------------------------+-----------+----------+---------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;part_1000_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; createddate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (createddate)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: part_2001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;), PARTITIONED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_2002 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;), PARTITIONED,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_2003 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;), PARTITIONED
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; part_2001
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr.part_2001&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------+-----------------------------+-----------+----------+---------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;part_1000_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; createddate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: part_1000 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((createddate &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (createddate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (createddate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: LIST (name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: part_3001 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_3002 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;def&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_3003 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;jkl&amp;#39;&lt;/span&gt;) &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now insert a row:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; part_1000 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 08:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; part_1000;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; createddate 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+------+------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; part_3001 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6385&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; abc &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Data is stored in the lowest-level sub-partition.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Declarative Partitioning Feature Summary
 &lt;div id="declarative-partitioning-feature-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#declarative-partitioning-feature-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;No INTERVAL partitioning&lt;/strong&gt;. There is no built-in automatic partition creation feature, which makes maintenance more cumbersome.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Partitions themselves are tables&lt;/strong&gt;. This is a distinctive characteristic. This not only allows PostgreSQL to flexibly operate on sub-partitions but, more importantly, affects functionality and behavior.&lt;/li&gt;
&lt;li&gt;TRUNCATE, VACUUM, and ANALYZE on a partitioned table operate on all partitions. TRUNCATE ONLY cannot be executed on the parent table but can be executed on a child table containing data, clearing only that sub-partition.&lt;/li&gt;
&lt;li&gt;RANGE and HASH partition keys can have multiple columns; LIST partition keys can only be a single column or expression.&lt;/li&gt;
&lt;li&gt;The partitioned parent table itself is empty; only the lowest-level sub-partitions contain data.&lt;/li&gt;
&lt;li&gt;A DEFAULT partition receives data that falls outside declared ranges. Without a DEFAULT partition, inserting out-of-range data will raise an error.&lt;/li&gt;
&lt;li&gt;When adding a new partition, check whether the DEFAULT partition contains data belonging to the new partition.&lt;/li&gt;
&lt;li&gt;Partitions created via PARTITION OF automatically create indexes, constraints, and row-level triggers from the parent table.&lt;/li&gt;
&lt;li&gt;ATTACH does not handle any indexes, constraints, or other objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Inheritance Partitioning
 &lt;div id="inheritance-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inheritance-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Inheritance partitioning is also officially supported. It leverages PostgreSQL&amp;rsquo;s table inheritance feature to implement partitioning functionality. Inheritance partitioning is more flexible than declarative partitioning.
Implementing inheritance partitioning requires two PostgreSQL features: &lt;a href="https://www.postgresql.org/docs/current/ddl-inherit.html" target="_blank" rel="noreferrer"&gt;table inheritance&lt;/a&gt; and write redirection. Write redirection can be implemented via &lt;a href="https://www.postgresql.org/docs/current/rules.html" target="_blank" rel="noreferrer"&gt;rules&lt;/a&gt; or triggers.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Creating Inheritance Partition Tables
 &lt;div id="creating-inheritance-partition-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#creating-inheritance-partition-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Example of creating inheritance partitioned tables:
&lt;strong&gt;1. Create the parent table&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; city_id int &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logdate date &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; peaktemp int,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; unitsales int
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;2. Create child tables with CHECK constraints for partitioning ranges&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202308 (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#66d9ef"&gt;INHERITS&lt;/span&gt; (measurement);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202309 (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-10-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;) &lt;span style="color:#66d9ef"&gt;INHERITS&lt;/span&gt; (measurement);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;3. Create rules or triggers to redirect inserted data to the corresponding child tables&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;REPLACE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FUNCTION&lt;/span&gt; measurement_insert_trigger()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;RETURNS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRIGGER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;IF&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; ) &lt;span style="color:#66d9ef"&gt;THEN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202308 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ELSIF&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-10-01&amp;#39;&lt;/span&gt; ) &lt;span style="color:#66d9ef"&gt;THEN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202309 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ELSE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; RAISE &lt;span style="color:#66d9ef"&gt;EXCEPTION&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;Date out of range. Fix the measurement_insert_trigger() function!&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;END&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;IF&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;RETURN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;END&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LANGUAGE&lt;/span&gt; plpgsql;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TRIGGER&lt;/span&gt; insert_measurement_trigger
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;BEFORE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EACH&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ROW&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FUNCTION&lt;/span&gt; measurement_insert_trigger();&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;A basic inheritance partitioned table is now set up.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; measurement
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.measurement&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+---------+-----------+----------+---------+---------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; city_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; peaktemp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; unitsales &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Triggers:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; insert_measurement_trigger &lt;span style="color:#66d9ef"&gt;BEFORE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EACH&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ROW&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;EXECUTE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FUNCTION&lt;/span&gt; measurement_insert_trigger()
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Child tables: measurement_202308,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; measurement_202309
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Test insertion and querying:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Inserting data outside the defined range raises an error
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;, now() &lt;span style="color:#f92672"&gt;-&lt;/span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;31&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt; ,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: P0001: Date &lt;span style="color:#66d9ef"&gt;out&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; range. Fix the measurement_insert_trigger() &lt;span style="color:#66d9ef"&gt;function&lt;/span&gt;&lt;span style="color:#f92672"&gt;!&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CONTEXT: PL&lt;span style="color:#f92672"&gt;/&lt;/span&gt;pgSQL &lt;span style="color:#66d9ef"&gt;function&lt;/span&gt; measurement_insert_trigger() line &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;at&lt;/span&gt; RAISE
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: exec_stmt_raise, pl_exec.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3889&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Inserting data is redirected to the child table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;,now(),&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Querying the parent table returns data from child tables
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; tableoid::regclass,&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; measurement;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; city_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;|&lt;/span&gt; peaktemp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; unitsales 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------+---------+------------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; measurement_202308 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;RULE vs. Trigger&lt;/strong&gt;
Besides triggers, PostgreSQL can also use rules to redirect inserts.
Example rule statements:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;RULE&lt;/span&gt; measurement_insert_202308 &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSTEAD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202308 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;RULE&lt;/span&gt; measurement_insert_202309 &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-09-01&amp;#39;&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;DO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSTEAD&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; measurement_202309 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;NEW&lt;/span&gt;.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Differences between rules and triggers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Rules have worse performance than triggers in general, but for bulk inserts rules perform better since they only check once. In all other cases, triggers are preferable.&lt;/li&gt;
&lt;li&gt;COPY does not fire rules but does fire triggers. When using rules, data can be COPY&amp;rsquo;d directly into child tables.&lt;/li&gt;
&lt;li&gt;When inserting data outside defined ranges, rules will insert into the parent table, while triggers will raise an error.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Indexes&lt;/strong&gt;
To improve performance, you also need to create indexes and enable constraint_exclusion. Indexes on partitions are generally essential. For inheritance tables, indexes must be manually created on child tables.
Example of creating indexes:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_measurement_202308_logdate &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement_202308 (logdate);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_measurement_202309_logdate &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; measurement_202309 (logdate);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Insert some data and check the execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- &amp;#39;2023-08-04&amp;#39; has only 1 row, allowing it to use the index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;,now()&lt;span style="color:#f92672"&gt;+&lt;/span&gt;interval &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;day&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;),&lt;span style="color:#e6db74"&gt;&amp;#39;a&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;),now(),&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; logdate&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-04&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement measurement_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (logdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-04&amp;#39;&lt;/span&gt;::date)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_measurement_202308_logdate &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement_202308 measurement_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (logdate &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-08-04&amp;#39;&lt;/span&gt;::date)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In the above execution plan, the August partition uses the index on the partition. Since constraint_exclusion is enabled by default for inheritance tables, the September partition was excluded and only August was scanned. However, because the parent table has no constraints (and cannot have them), it always appears in the execution plan—but since the parent table is generally empty, this has minimal impact.&lt;/p&gt;

&lt;h4 class="relative group"&gt;constraint_exclusion
 &lt;div id="constraint_exclusion" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#constraint_exclusion" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;constraint_exclusion controls whether the optimizer uses constraints to reduce unnecessary table access. This parameter is commonly used in inheritance partitioning optimization — by reducing child table access, it improves SQL performance. (This functionality is similar to the enable_partition_pruning parameter, which controls partition pruning for declarative partitioned tables.) constraint_exclusion has three values:
&lt;code&gt;on&lt;/code&gt;: All tables are checked for constraints.
&lt;code&gt;partition&lt;/code&gt;: Inheritance tables and UNION ALL subqueries are checked for constraints (default).
&lt;code&gt;off&lt;/code&gt;: Constraints are not checked.
Constraint exclusion only occurs during execution plan generation, not during actual execution (partition pruning can occur during execution). This means constraint exclusion does not happen when using bound parameters or variable values.
For example, when using functions like now() whose specific value the optimizer cannot determine, the optimizer cannot exclude partitions that don&amp;rsquo;t need to be accessed at all:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; now();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; now 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;03&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;772658&lt;/span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The optimizer did not exclude the September partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; logdate&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;now();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1628&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement measurement_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement_202308 measurement_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1010&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; measurement_202309 measurement_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;44&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;26&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;617&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; idx_measurement_202309_logdate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;617&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (logdate &lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; now())&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, constraint exclusion itself needs to check all child table constraints. If there are too many child table constraints, the efficiency of generating execution plans will be affected. Therefore, inheritance partitioning is not recommended for creating too many child partitions.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Adding/Removing Partitions in Inheritance Partitioning
 &lt;div id="addingremoving-partitions-in-inheritance-partitioning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#addingremoving-partitions-in-inheritance-partitioning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;To turn an inherited partition into a regular table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202308 &lt;span style="color:#66d9ef"&gt;NO&lt;/span&gt; INHERIT measurement;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;To add an existing regular table (with data) as a child table in the inheritance partition:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202310 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; measurement &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202310 &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; measurement_202310_logdate_check 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ( logdate &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-10-01&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; logdate &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; DATE &lt;span style="color:#e6db74"&gt;&amp;#39;2023-11-01&amp;#39;&lt;/span&gt; );
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--insert into measurement_202310 values(2001,&amp;#39;20231010&amp;#39;,3,3);
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; measurement_202310 INHERIT measurement;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 class="relative group"&gt;Inheritance Partitioning Feature Summary
 &lt;div id="inheritance-partitioning-feature-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inheritance-partitioning-feature-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Inheritance partitioning is more flexible than declarative partitioning, but some declarative partitioning features are unavailable.&lt;/li&gt;
&lt;li&gt;Child tables inherit parent table constraints, so global constraints should not be set on the parent table.&lt;/li&gt;
&lt;li&gt;Indexes are not inherited; they must be created individually on each child table.&lt;/li&gt;
&lt;li&gt;Declarative partitioning only supports RANGE, LIST, and HASH partitions. Inheritance partitioning can support more, including custom partitioning methods.&lt;/li&gt;
&lt;li&gt;Dropping a child table does not invalidate the trigger. PostgreSQL does not have Oracle&amp;rsquo;s concept of invalidated objects (indexes do have an invalidation concept).&lt;/li&gt;
&lt;li&gt;Generally, using triggers for insert redirection is more efficient than rules.&lt;/li&gt;
&lt;li&gt;When adding a new partition, if the trigger function lacks a rule for that partition, the trigger function needs to be updated.&lt;/li&gt;
&lt;li&gt;Inheritance partitioning supports multiple inheritance.&lt;/li&gt;
&lt;li&gt;Constraint exclusion cannot occur during execution; using fixed values for queries is recommended.&lt;/li&gt;
&lt;li&gt;With inheritance partitioning, avoid creating too many child partitions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;pg_pathman
 &lt;div id="pg_pathman" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_pathman" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;pg_pathman is a third-party plugin implementing partitioning functionality. The &lt;a href="https://github.com/postgrespro/pg_pathman" target="_blank" rel="noreferrer"&gt;pg_pathman README on GitHub&lt;/a&gt; and &lt;a href="https://developer.aliyun.com/article/62314" target="_blank" rel="noreferrer"&gt;articles on using pg_pathman&lt;/a&gt; already describe pathman in great detail. Here we only highlight key points and do some simple testing.&lt;/p&gt;

&lt;h4 class="relative group"&gt;pg_pathman Basics
 &lt;div id="pg_pathman-basics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pg_pathman-basics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;No Longer Maintained&lt;/strong&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;NOTE: this project is not under development anymore&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;pg_pathman supports PostgreSQL 9.5 through 15. Later PostgreSQL versions are no longer supported, and existing versions only receive bug fixes — no new features will be added.
pg_pathman emerged because older PostgreSQL versions had incomplete partitioning features. Now that native partitioned tables (declarative partitioning) are very mature, pg_pathman also recommends using native partitioned tables. Existing pg_pathman partitioned tables are also recommended to be migrated to native partitioned tables. pg_pathman, once recognized by many users, is now history. Even though it&amp;rsquo;s no longer updated, its feature set is still richer than the current native partitioned tables.
&lt;strong&gt;Feature Highlights&lt;/strong&gt;
pg_pathman is quite powerful, supporting some features that native partitioned tables do not. However, pathman is not perfect either and has many issues in practice. Key points to note about pg_pathman include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;pg_pathman can manage partitions through partition management functions. It supports replace, merge, split partition operations; attach and detach operations; and INTERVAL partitioning.&lt;/li&gt;
&lt;li&gt;pg_pathman has many optimizations for partitioned table execution plans.&lt;/li&gt;
&lt;li&gt;pg_pathman only supports RANGE and HASH partition types.&lt;/li&gt;
&lt;li&gt;The pathman_config table stores partition configuration information; it provides partition task views.&lt;/li&gt;
&lt;li&gt;Partition information is cached in memory for execution plan generation.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Basic pg_pathman Usage
 &lt;div id="basic-pg_pathman-usage" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#basic-pg_pathman-usage" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Creating pathman RANGE partitions&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The regular table serves as the parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; journal (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id SERIAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dt &lt;span style="color:#66d9ef"&gt;TIMESTAMP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; INTEGER,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg TEXT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Indexes on the parent table are automatically created on child partitions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; journal(dt);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create partitions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;create_range_partitions(&lt;span style="color:#e6db74"&gt;&amp;#39;journal&amp;#39;&lt;/span&gt;::regclass, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;dt&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; interval &lt;span style="color:#e6db74"&gt;&amp;#39;1 month&amp;#39;&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) ; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View table definition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; journal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.journal&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;journal_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dt &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;journal_dt_idx&amp;#34;&lt;/span&gt; btree (dt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Child tables: journal_1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_2,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_3,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_4,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_5,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; journal_6
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.journal_6&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+-----------------------------+-----------+----------+-------------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;journal_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dt &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; msg &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;journal_6_dt_idx&amp;#34;&lt;/span&gt; btree (dt)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pathman_journal_6_check&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (dt &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; dt &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Inherits&lt;/span&gt;: journal
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; journal (dt, &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt;, msg)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;, random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10000&lt;/span&gt;, md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-28&amp;#39;&lt;/span&gt;::date, &lt;span style="color:#e6db74"&gt;&amp;#39;1 hour&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert data for which no corresponding partition has been created yet
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; journal (dt, &lt;span style="color:#66d9ef"&gt;level&lt;/span&gt;, msg) &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-07-01&amp;#39;&lt;/span&gt;::date,&lt;span style="color:#e6db74"&gt;&amp;#39;11&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check partition data distribution; the INTERVAL partition has been automatically created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; journal &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_7 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;649&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; journal_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;744&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- View execution plan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Partition pruning has occurred
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; journal &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; dt&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; journal journal_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (dt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; journal_1_dt_idx &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; journal_1 journal_1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;29&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (dt &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Creating pathman HASH partitions&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; items (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id SERIAL &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name TEXT,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; code BIGINT);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create HASH partitions
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; create_hash_partitions(&lt;span style="color:#e6db74"&gt;&amp;#39;items&amp;#39;&lt;/span&gt;::regclass, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;id&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;false&lt;/span&gt;) ; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Insert data 
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; items (id, name, code)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;, md5(&lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;::text), random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100000&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;g&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; items &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;344&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;338&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; items
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.items&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+-----------+----------+-----------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;items_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; code &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;items_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Child tables: items_0,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_1,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; items_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.items_1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+-----------+----------+-----------------------------------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; nextval(&lt;span style="color:#e6db74"&gt;&amp;#39;items_id_seq&amp;#39;&lt;/span&gt;::regclass) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; text &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; code &lt;span style="color:#f92672"&gt;|&lt;/span&gt; bigint &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;items_1_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pathman_items_1_check&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (get_hash_part_idx(hashint4(id), &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Inherits&lt;/span&gt;: items
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; items &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;344&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;318&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; items_1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;338&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Pros and Cons of PostgreSQL Partitioned Tables
 &lt;div id="pros-and-cons-of-postgresql-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pros-and-cons-of-postgresql-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Advantages of Partitioned Tables
 &lt;div id="advantages-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#advantages-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;SQL performance improvement. In certain scenarios, such as splitting a large amount of data into multiple partitions where SQL only needs to query one partition, SQL performance can be dramatically improved.&lt;/li&gt;
&lt;li&gt;Partitions can work together with indexes. For example, accessing an index on a single partition is more efficient than accessing a large unpartitioned index.&lt;/li&gt;
&lt;li&gt;Dropping a single partition is much more efficient than deleting many rows. This is common in time-range partitioning — dropping an unused historical partition is very fast, but without partitioning, DELETE operations are not only slow but also require additional maintenance.&lt;/li&gt;
&lt;li&gt;VACUUM is faster. Reclaiming old version information or collecting statistics on a large table is very slow. If VACUUM hasn&amp;rsquo;t finished executing, SQL may already be experiencing problems. With partitioning, VACUUM becomes much faster.&lt;/li&gt;
&lt;li&gt;I/O distribution capability. Different partitions can be placed on different paths or different disks. Rarely-used data can be placed on cheaper disks.&lt;/li&gt;
&lt;li&gt;More maintenance techniques. Directly maintaining a very large table is difficult — for example, VACUUM on an extremely large table has many issues. With partitioned tables, each partition can run VACUUM independently. Moreover, ATTACH/DETACH, local indexes/constraints, and more can be flexibly used in many scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Disadvantages of Partitioned Tables
 &lt;div id="disadvantages-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#disadvantages-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;In PostgreSQL, every partition of a partitioned table can be treated as a regular table. Too many partitions can lead to longer SQL parsing times and higher memory load, even causing errors. See the previous article: &lt;a href="https://editor.csdn.net/md/?articleId=131497779" target="_blank" rel="noreferrer"&gt;Too many range table entries even with a modest number of partitions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Even if having too many partitions doesn&amp;rsquo;t cause errors, and partition pruning doesn&amp;rsquo;t happen during execution plan generation (it might happen during execution), the EXPLAIN output will be extremely long. At that point, the logs will also contain lengthy execution plans, affecting log readability.&lt;/li&gt;
&lt;li&gt;Some strange issues: &lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247489813&amp;amp;idx=1&amp;amp;sn=22360e2bfd40fc2d0caed0a9d825b1d4&amp;amp;chksm=fa663124cd11b832953e789127927ffa0d63d6c948ca8934d5317b8eaae6e71374041ec038f7&amp;amp;mpshare=1&amp;amp;srcid=0728JrXnHdxnfgRVzqosBNcv&amp;amp;sharer_sharetime=1690509489198&amp;amp;sharer_shareid=0412ea33e50b471b98d8859a5c431367&amp;amp;from=singlemessage&amp;amp;scene=1&amp;amp;subscene=10000&amp;amp;sessionid=1690509419&amp;amp;clicktime=1690509545&amp;amp;enterid=1690509545&amp;amp;ascene=1&amp;amp;fasttmpl_type=0&amp;amp;fasttmpl_fullversion=6785798-en_US-zip&amp;amp;fasttmpl_flag=0&amp;amp;realreporttime=1690509545257&amp;amp;devicetype=android-29&amp;amp;version=28002658&amp;amp;nettype=WIFI&amp;amp;abtest_cookie=AAACAA%3D%3D&amp;amp;lang=en&amp;amp;countrycode=CN&amp;amp;exportkey=n_ChQIAhIQCCtq2jm3UsFznlVjxFEOWBLaAQIE97dBBAEAAAAAABKTCFyWAsoAAAAOpnltbLcz9gKNyK89dVj0LyxnG1pA6NiO6PHIsQ0Hy2N7QRbizb9SHdquaFOpOqANqG8jLDcioswZyRnYknjG4bSqNIIKm%2BpRIlK%2FVJxuwolH2%2FQJKSLg4YjccDktYYscUDvYSfHFx1ScEXZkOkbVqrvbBCPy6Gh2GnzulFuuIU68afNtsoBdzZTqHYbL0BfsAUhsz1iGAfSep642UT2CBpWSHWJQvndnwhZxjJ6%2FWO%2FI%2FqwncggiVeDNiv4vwXhluDNn&amp;amp;pass_ticket=mrpzS3wggBDzL9Ua2FmX5v1rYh6zKOnQ4og6oKcKv0ZXRfNBSUpSkGdTAcfXqgDo&amp;amp;wx_header=3" target="_blank" rel="noreferrer"&gt;Different users see different execution plans&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Limitations of Partitioned Tables
 &lt;div id="limitations-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#limitations-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;No native automatic partition creation feature&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Only local partition indexes are supported; global indexes are not supported&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Primary keys must include the partition key. PostgreSQL currently can only enforce uniqueness within each partition, hence this limitation. Oracle and MySQL do not have this restriction.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unique indexes must include the partition key. PostgreSQL currently can only enforce uniqueness within each partition. Same applies to primary keys.&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Cannot create globally-defined constraints&lt;/strong&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;BEFORE ROW INSERT triggers cannot update the partition into which the row is being inserted.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Temporary table partitions and regular table partitions cannot coexist under the same partitioned table.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In declarative partitioning, parent and child table columns must be identical; in inheritance partitioning, child tables can have more columns than the parent table.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In declarative partitioning, CHECK and NOT NULL constraints are always inherited; these two constraints cannot be set independently on individual partitions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;RANGE partitions cannot store NULL values. HASH partitions have no concept of NULL partitions but can store NULL values — they are placed on the remainder 0 partition. LIST partitions can explicitly create a NULL partition to store NULL data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;When Should You Use Partitioned Tables?
 &lt;div id="when-should-you-use-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#when-should-you-use-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;First, to use partitioned tables you must understand the advantages, disadvantages, and limitations they bring. For example, when data volume is large, partitioning can improve performance; hot/cold data separation also makes partition data management easier. You should decide whether to partition and how to partition based on your specific business situation and hardware resources. However, developers will always ask questions like &amp;ldquo;how much data warrants partitioning.&amp;rdquo; Advice on using partitioned tables can only be given in general terms. If you don&amp;rsquo;t know how to partition, you can refer to the following recommendations (if you already have sufficient understanding of table partitioning, please ignore):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The table data is large enough, and SQL queries on the table always or can include the partition key column.&lt;/li&gt;
&lt;li&gt;Clear hot/cold data separation. For example, new data is always inserted into the current month&amp;rsquo;s partition, while the other 11 months of old partitions are read-only.&lt;/li&gt;
&lt;li&gt;VACUUM can no longer keep up.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Partition Table Permissions
 &lt;div id="partition-table-permissions" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-permissions" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Permission issues are less discussed in the context of partitioned table knowledge, but they are still worth paying attention to.
Because PostgreSQL has the concept that &amp;ldquo;partition child tables are also regular tables,&amp;rdquo; this differs from other common databases (Oracle, MySQL). For example, in Oracle you don&amp;rsquo;t need to worry about partition child table permissions, but in PostgreSQL you do.&lt;/p&gt;
&lt;p&gt;PARTITION OF / ATTACH do not inherit the parent table&amp;rsquo;s permissions to child tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Grant SELECT on the partitioned table to a regular user
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; userlzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;GRANT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check permissions — only the parent table has been granted; existing partition child tables are not automatically granted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; grantee,table_schema,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;,privilege_type &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.table_privileges &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; grantee&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;userlzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grantee &lt;span style="color:#f92672"&gt;|&lt;/span&gt; table_schema &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; privilege_type 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------------+---------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; userlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Create a partition using PARTITION OF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202303 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create a partition using ATTACH
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202304 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check permissions again — newly created child partitions are not automatically granted to other users (but permissions are automatically granted to the owner)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; grantee,table_schema,&lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt;,privilege_type &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; information_schema.table_privileges &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; grantee&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;userlzl&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; grantee &lt;span style="color:#f92672"&gt;|&lt;/span&gt; table_schema &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table_name&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; privilege_type 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+--------------+---------------+----------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; userlzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;At this point, user &lt;code&gt;userlzl&lt;/code&gt; has no access permissions to any child tables, but has permissions on the parent table.
&lt;code&gt;userlzl&lt;/code&gt; can access partition data through the parent table, but cannot access data by directly querying child tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; userlzl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;You &lt;span style="color:#66d9ef"&gt;are&lt;/span&gt; now connected &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;userlzl&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;2159&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; d05d716da126ff4b44d934344cc4dd7a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLPARTITION1_202301 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42501&lt;/span&gt;: permission denied &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: aclcheck_error, aclchk.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3466&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Since ATTACH/DETACH does not handle permissions, if we DETACH a partition at this point, that partition will also be inaccessible to &lt;code&gt;userlzl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202303;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dp&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202303;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;privileges&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Policies 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----------------------+-------+-------------------+-------------------+----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLPARTITION1_202301 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;42501&lt;/span&gt;: permission denied &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301 &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;From this we can conclude:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partition child tables and the parent table exist as regular tables in PostgreSQL, each with their own permission system.&lt;/li&gt;
&lt;li&gt;If you lack child table permissions but have parent table permissions, you can still access child table data.&lt;/li&gt;
&lt;li&gt;PARTITION OF, ATTACH, and DETACH do not handle permission issues.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, partition table permissions do not merely control whether access is possible. Lacking partition child table permissions can lead to abnormal execution plans. Reference article: &lt;a href="https://mp.weixin.qq.com/s?__biz=MzUyOTAyMzMyNg==&amp;amp;mid=2247489813&amp;amp;idx=1&amp;amp;sn=22360e2bfd40fc2d0caed0a9d825b1d4&amp;amp;chksm=fa663124cd11b832953e789127927ffa0d63d6c948ca8934d5317b8eaae6e71374041ec038f7&amp;amp;mpshare=1&amp;amp;srcid=0728JrXnHdxnfgRVzqosBNcv&amp;amp;sharer_sharetime=1690509489198&amp;amp;sharer_shareid=0412ea33e50b471b98d8859a5c431367&amp;amp;from=singlemessage&amp;amp;scene=1&amp;amp;subscene=10000&amp;amp;sessionid=1690509419&amp;amp;clicktime=1690509545&amp;amp;enterid=1690509545&amp;amp;ascene=1&amp;amp;fasttmpl_type=0&amp;amp;fasttmpl_fullversion=6785798-en_US-zip&amp;amp;fasttmpl_flag=0&amp;amp;realreporttime=1690509545257&amp;amp;devicetype=android-29&amp;amp;version=28002658&amp;amp;nettype=WIFI&amp;amp;abtest_cookie=AAACAA%3D%3D&amp;amp;lang=en&amp;amp;countrycode=CN&amp;amp;exportkey=n_ChQIAhIQCCtq2jm3UsFznlVjxFEOWBLaAQIE97dBBAEAAAAAABKTCFyWAsoAAAAOpnltbLcz9gKNyK89dVj0LyxnG1pA6NiO6PHIsQ0Hy2N7QRbizb9SHdquaFOpOqANqG8jLDcioswZyRnYknjG4bSqNIIKm%2BpRIlK%2FVJxuwolH2%2FQJKSLg4YjccDktYYscUDvYSfHFx1ScEXZkOkbVqrvbBCPy6Gh2GnzulFuuIU68afNtsoBdzZTqHYbL0BfsAUhsz1iGAfSep642UT2CBpWSHWJQvndnwhZxjJ6%2FWO%2FI%2FqwncggiVeDNiv4vwXhluDNn&amp;amp;pass_ticket=mrpzS3wggBDzL9Ua2FmX5v1rYh6zKOnQ4og6oKcKv0ZXRfNBSUpSkGdTAcfXqgDo&amp;amp;wx_header=3" target="_blank" rel="noreferrer"&gt;Different users see different execution plans&lt;/a&gt;
This issue is an intermittent phenomenon that causes superusers and regular users to see different SQL execution plans. The actual business SQL execution plan is abnormal but goes unnoticed, making it difficult to diagnose. Partition child tables have their own statistics, and child table permissions are inconsistent with the parent table (even for partitions created via PARTITION OF), resulting in users being able to access child table data through the parent table but unable to view the child table&amp;rsquo;s statistics. This permission issue leads to differences in execution plans.
This contradicts the general concept that &amp;ldquo;&lt;em&gt;permissions only control whether you can access a table, not how you access it&lt;/em&gt;,&amp;rdquo; so attention must be paid to this permission issue.
To provide permission for child table statistics, it is recommended to explicitly grant SELECT on all child tables to the user, which avoids the issues above:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;grant&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_partition_allname &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; username;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;Partition Table Maintenance
 &lt;div id="partition-table-maintenance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-maintenance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;ATTACH/DETACH Basic Operations
 &lt;div id="attachdetach-basic-operations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#attachdetach-basic-operations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;ATTACH/DETACH can add/detach an existing table as a partition of/detach from a partitioned table. ATTACH/DETACH is very useful in maintenance work.
First, let&amp;rsquo;s look at the locking behavior of adding partitions via &amp;ldquo;CREATE TABLE &amp;hellip; PARTITION OF&amp;rdquo; and deleting partitions via &amp;ldquo;DROP TABLE&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;Lock Matrix: &lt;a href="https://www.postgresql.org/docs/current/explicit-locking.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/explicit-locking.html&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Lock Requests: &lt;a href="https://postgres-locks.husseinnasser.com" target="_blank" rel="noreferrer"&gt;https://postgres-locks.husseinnasser.com&lt;/a&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Adding a partition via PARTITION OF&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start a transaction, read-only data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;8249&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;256&lt;/span&gt;ac66bb53d31bc6124294238d6410c &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status. When reading data from one partition, locks are acquired on both the child partition and the parent table.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+-----------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Add a partition via PARTITION OF
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1_202305 partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; LZLPARTITION1 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check locks again
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- This is the PARTITION OF session
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Run an arbitrary query
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Check locks again
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;84774&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- Query is blocked&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When adding a partition via PARTITION OF, an AccessExclusiveLock is requested on the parent table. This waits for all transactions on the parent table and also blocks all transactions on the parent table.



&lt;img src="https://lastdba.com/img/csdn/851906be0f93.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;Although the PARTITION OF statement itself executes quickly, if there are long-running transactions on the parent table, all operations on the partitioned table will stall for an extended period. Without a maintenance window, using PARTITION OF to add partitions directly is not recommended.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;strong&gt;Dropping a partition via DROP TABLE&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start another read-only transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Drop a child partition of the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202305;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Dropping a child partition with DROP TABLE requests an AccessExclusiveLock on the parent table, waiting for all and blocking all. Similarly, this must be used with caution in production environments.&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;&lt;strong&gt;ATTACH — adding a partition&lt;/strong&gt;
ATTACH attaches an existing regular table to a partitioned table.
Although both ATTACH and PARTITION OF can add partitions, note that &lt;strong&gt;ATTACH does not automatically create indexes, constraints, default values, or row-level triggers&lt;/strong&gt; — this differs from PARTITION OF.
First, create a table:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- To reduce tedious DDL, use LIKE to create the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202305
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Now observe whether ATTACH is blocked:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start a read-write transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DML statements acquire RowExclusiveLock on the partition parent table and the corresponding partition child table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: ATTACH the newly created table to the partition parent table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202305 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-06-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ATTACH only requests a SHARE UPDATE EXCLUSIVE lock, which is much lighter than ACCESS EXCLUSIVE.



&lt;img src="https://lastdba.com/img/csdn/b23cc350250f.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;ATTACH does not block reads or writes, so ATTACH is recommended for adding partitions — it does not affect business operations and can be executed online.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;&lt;strong&gt;DETACH — removing a partition&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;DETACH removes a partition from the partitioned table, turning it into a regular table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Keep the DML transaction uncommitted
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: DETACH a partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202305;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;311449&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;308525&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Unlike ATTACH, DETACH requests an AccessExclusiveLock on the parent table, waiting for all and blocking all.&lt;/p&gt;
&lt;ol start="5"&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DETACH CONCURRENTLY&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Starting from PostgreSQL 14, DETACH gained two new syntax variants: CONCURRENTLY and FINALIZE.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;blockquote&gt;&lt;p&gt;ALTER TABLE [ IF EXISTS ] &lt;em&gt;&lt;code&gt;name&lt;/code&gt;&lt;/em&gt;
DETACH PARTITION &lt;em&gt;&lt;code&gt;partition_name&lt;/code&gt;&lt;/em&gt; [ CONCURRENTLY | FINALIZE ]&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;DETACH CONCURRENTLY internally starts two transactions. The first transaction requests a SHARE UPDATE EXCLUSIVE lock on both the parent and child tables, marking the partition as being in a detaching state, at which point it waits for all transactions on the partitioned table to commit. Once all those transactions have committed, the second transaction requests a SHARE UPDATE EXCLUSIVE lock on the parent table and an ACCESS EXCLUSIVE lock on that child table, after which DETACH CONCURRENTLY completes.&lt;/p&gt;
&lt;p&gt;Additionally, after DETACH CONCURRENTLY, the detached child table retains its constraint — the partition constraint is converted into a CHECK constraint on the detached table.&lt;/p&gt;
&lt;p&gt;DETACH CONCURRENTLY limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;DETACH CONCURRENTLY cannot be placed inside a transaction block.&lt;/li&gt;
&lt;li&gt;The partitioned table cannot have a DEFAULT partition.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Locking behavior of CONCURRENTLY:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: DETACH CONCURRENTLY
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;); &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Client &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ClientRead
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Lock&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3947&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pid,query,wait_event_type,wait_event &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_stat_activity; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The DETACH session is 3940. Interestingly, the DETACH wait event is virtualxid, and the wait event type is Lock.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check lock details
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; locktype,&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;,relation,virtualtransaction,pid,&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; pid &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualtransaction &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------+----------+----------+--------------------+------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16387&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40969&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;16387&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;40963&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;179&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- At this point, DETACH is not yet waiting for a table-level lock; it is waiting for a ShareLock on virtualxid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Try an insert
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;12345&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; relation &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;found&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DETAIL: Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; the failing &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;contains&lt;/span&gt; (date_created) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;).
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;12345&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- The detaching partition can no longer accept inserts, but other partitions can.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- What if we insert directly into the partition? It works fine.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;12345&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Note: at this point it is still a partition of the partitioned table, not yet a regular table, but it has been marked as unavailable.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- \d+ shows the partition in DETACH PENDING state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) (DETACH PENDING),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Commit/rollback the insert session (Session 1)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2 completes immediately
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;FINALIZE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=*&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1234&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abcd&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 01:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: DETACH CONCURRENTLY, manually canceled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 concurrently;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;^&lt;/span&gt;CCancel request sent
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: canceling &lt;span style="color:#66d9ef"&gt;statement&lt;/span&gt; due &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;user&lt;/span&gt; request
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- \d+ shows the partition in DETACH PENDING state
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) (DETACH PENDING),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- In DETACH PENDING state, SQL no longer accesses this partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;752&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;81&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;38881&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use FINALIZE to complete the detach
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 finalize; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;lzldb&lt;span style="color:#f92672"&gt;-#&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+------+--------------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3940&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareUpdateExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3691&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- 3940, FINALIZE requests ShareUpdateExclusiveLock on the parent table and AccessExclusiveLock on the child table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Since the inserted data happened to be in the detaching partition, it is waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1 ends
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=!&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rollback&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ROLLBACK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2 completes immediately
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301 finalize; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Although DETACH requests an 8-level lock on the partition, generally business operations don&amp;rsquo;t write directly through child partitions, so you only need to ensure that long-running transactions on the partitioned table complete quickly. Usually, there&amp;rsquo;s no need to worry about subsequent blocking on that partition&amp;rsquo;s child table.&lt;/p&gt;
&lt;p&gt;Online DETACH summary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The blocking behavior of DETACH CONCURRENTLY is somewhat similar to CIC (CREATE INDEX CONCURRENTLY) — it does not block other transactions, but it itself waits for existing transactions to complete. This is not easily visible from lock information alone.&lt;/li&gt;
&lt;li&gt;During DETACH CONCURRENTLY, the partition enters a DETACH PENDING intermediate state. This state is somewhat like INVISIBLE — SQL will not find this partition.&lt;/li&gt;
&lt;li&gt;If DETACH PENDING is caused by long-running transactions, promptly end those transactions; if it&amp;rsquo;s caused by interruption, use FINALIZE to complete the detach.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Using Constraints to Reduce ATTACH Time
 &lt;div id="using-constraints-to-reduce-attach-time" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#using-constraints-to-reduce-attach-time" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Partition data overview — prepare to ATTACH a relatively large partition:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; partition, &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; partition;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2592001&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzlpartition1_202302 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;38881&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note: this 202301 partition has a PARTITION CONSTRAINT:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="2"&gt;
&lt;li&gt;DETACH the partition:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After DETACH, the PARTITION CONSTRAINT is gone
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;ATTACH without adding a CHECK constraint:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;343&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;498&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Because it must scan the partition data to verify it satisfies the partition range, ATTACH took 300+ ms.&lt;/p&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Add a CHECK constraint first, then ATTACH:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 detach partition lzlpartition1_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202301 &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;355&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;458&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The time taken to add the CHECK constraint is roughly the same as the ATTACH operation without a CHECK — because adding a CHECK constraint also needs to scan and validate all data.
Once the CHECK constraint is added, the subsequent ATTACH completes very quickly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 attach partition lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;480&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="5"&gt;
&lt;li&gt;Drop the CHECK constraint:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_pkey&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Check&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraints&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;chk_202301&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; (date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Note: CHECK CONSTRAINT and PARTITION CONSTRAINT are different concepts, even though their constraint content can be identical. ATTACH uses the CHECK constraint but does not merge it. You can explicitly drop this redundant CHECK:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Additionally, note that DROP CONSTRAINT requests an AccessExclusiveLock on the current child partition — this is the highest-level lock and blocks all operations. So, if there are transactions on that child partition, be cautious with DROP CONSTRAINT.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;444399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;444399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- This is the DROP CONSTRAINT session
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So,
&lt;strong&gt;When ATTACH-ing a partition, adding a CHECK constraint beforehand is useful — it reduces ATTACH execution time. The data validation just needs to be completed before ATTACH.&lt;/strong&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;The Correct Way to Add Partitions to a Partitioned Table
 &lt;div id="the-correct-way-to-add-partitions-to-a-partitioned-table" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-correct-way-to-add-partitions-to-a-partitioned-table" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;We now know that ATTACH can be executed online, while PARTITION OF / DROP TABLE / DETACH all request an AccessExclusiveLock that waits for and blocks everything.
So,
&lt;strong&gt;It is recommended to use ATTACH to create new partitions. PARTITION OF / DETACH both wait for and block all transactions, while ATTACH is not blocked by read-only/DML transactions.&lt;/strong&gt;
Therefore, adding partitions should use ATTACH, and a CHECK constraint should be created beforehand. When dropping constraints, be mindful of long-running transactions.
&lt;strong&gt;The correct way to add a partition to a partitioned table&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- To reduce tedious DDL, use LIKE to create the table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Refer to the PARTITION CONSTRAINT of other partitions, add a CHECK constraint on the table to reduce ATTACH constraint validation time
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303 &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add partition using ATTACH
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1 attach partition LZLPARTITION1_202303 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Optional. Drop the redundant CHECK constraint before transactions start on the new partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Locks on Partition Indexes
 &lt;div id="locks-on-partition-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#locks-on-partition-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;Creating/dropping partition indexes during read-only transactions&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;When a partition has a shared lock (AccessShareLock), meaning there is a query transaction on the partitioned table:
CREATE INDEX ON lzlpartition1 succeeds (note: without CONCURRENTLY); DROP INDEX lzlpartition1 fails:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start a transaction, read data from the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-02 00:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;86401&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create index, succeeds
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_datecreated &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(date_created);;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Drop index, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_datecreated;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+--------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99598&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CREATE INDEX does not request an AccessExclusiveLock on the table, but DROP INDEX does.
From this example we can conclude:
&lt;strong&gt;Read-only transactions do not block CREATE INDEX, but they do block DROP INDEX.&lt;/strong&gt;&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Creating/dropping partition indexes during update transactions&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Start an update transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 10:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create partition index, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_datecreated &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+--------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99598&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The CREATE INDEX session (99598) requests a ShareLock on the partition parent table; the DML transaction session (300371) holds RowExclusiveLock on the child partition and parent table.



&lt;img src="https://lastdba.com/img/csdn/9fc4b97314bd.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;CREATE INDEX (without CONCURRENTLY) requests ShareLock on the parent table;
Read-only transactions request AccessShareLock on the parent and child tables;
Update transactions request RowExclusiveLock on the parent and child tables;
==&amp;gt;
AccessShareLock does not block ShareLock, so queries do not block CREATE INDEX (without CONCURRENTLY);
RowExclusiveLock blocks ShareLock, so DML blocks CREATE INDEX (without CONCURRENTLY);&lt;/p&gt;
&lt;ol start="3"&gt;
&lt;li&gt;Creating partitioned indexes with CONCURRENTLY&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;Note: You cannot create indexes with CONCURRENTLY on a partitioned table.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;A000: cannot &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; partitioned &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1&amp;#34;&lt;/span&gt; concurrently
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: DefineIndex, indexcmds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;665&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;There is a patch at &lt;a href="https://commitfest.postgresql.org/35/2815/" target="_blank" rel="noreferrer"&gt;https://commitfest.postgresql.org/35/2815/&lt;/a&gt; working on solving this issue.&lt;/p&gt;
&lt;p&gt;Currently, you can create indexes with CONCURRENTLY on individual partition child tables:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1: Still using the previous DML transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create index with CONCURRENTLY on a child table, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202301 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+--------+--------------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;99598&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareUpdateExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;300371&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With CONCURRENTLY, the requested lock is one level lower and &lt;strong&gt;no longer conflicts&lt;/strong&gt; with ROW EXCL. The locks don&amp;rsquo;t conflict, so why is CONCURRENTLY itself still blocked?&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;it must wait for all existing transactions that could potentially modify or use the index to terminate.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The official documentation explains that CONCURRENTLY needs to wait for transactions that could potentially modify or use the index to terminate. In our case, the UPDATE statement modified the indexed column, so CONCURRENTLY needs to wait for it to complete.
&lt;strong&gt;Although CONCURRENTLY itself hasn&amp;rsquo;t completed due to the prior DML statement, there&amp;rsquo;s a benefit: CONCURRENTLY does not block subsequent DML statements.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- While CONCURRENTLY has not yet completed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 4: Update a record
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 12:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Summary of partition index locking issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Locking for read-only/read-write/index creation on partitioned tables is similar to regular tables. Just note that transactions acquire locks on both the partition parent table and child tables, so when subsequent blocking chains involve heavier locks, all partitions are affected.&lt;/li&gt;
&lt;li&gt;Read-only transactions do not block CREATE INDEX, but they do block DROP INDEX.&lt;/li&gt;
&lt;li&gt;DML blocks CREATE INDEX and also blocks CREATE INDEX CONCURRENTLY, but CONCURRENTLY does not block DML.&lt;/li&gt;
&lt;li&gt;Although CREATE INDEX on a partitioned table automatically creates indexes on all existing and future partitions, it is not recommended for direct use in production due to blocking issues.&lt;/li&gt;
&lt;li&gt;You cannot use CONCURRENTLY directly on the partition parent table, so you need to create indexes with CONCURRENTLY on each partition child table.&lt;/li&gt;
&lt;li&gt;CONCURRENTLY does not block subsequent transactions but itself gets blocked by prior long-running transactions and may cause the created index to be invalid. Attention must be paid to long-running transactions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;The Correct Way to Create Partition Indexes
 &lt;div id="the-correct-way-to-create-partition-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-correct-way-to-create-partition-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Although you cannot create indexes with CONCURRENTLY on a partitioned table, you can create indexes with CONCURRENTLY on partition child tables using the following syntax:
&lt;code&gt;CREATE INDEX ON ONLY&lt;/code&gt; : Creates an invalid index on the parent table; does not automatically create indexes on child partitions.
&lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; : Creates an index with CONCURRENTLY on a child partition.
&lt;code&gt;ALTER INDEX .. ATTACH PARTITION&lt;/code&gt; : Attaches the partition index to the parent index. After all child partition indexes have been attached, the partition parent table index is automatically marked as valid.
However, when executing these commands, you still need to pay attention to locking behavior.&lt;/p&gt;
&lt;p&gt;Below, observe the lock requests and blocking for the above two statements:
(DML explicit transaction in Session 1 is kept open throughout)&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Blocking behavior of CREATE INDEX ON ONLY:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; IDX_DATECREATED &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Waiting
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check lock status
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+--------+------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;448243&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; RowExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;444399&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; ShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CREATE INDEX ON ONLY requests a ShareLock. ShareLock and RowExclusiveLock block each other. So, although ONLY itself executes very quickly, CREATE INDEX ON ONLY should not be used casually either.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After the DML transaction ends, CREATE INDEX ON ONLY completes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;idx_datecreated&amp;#34;&lt;/span&gt; btree (date_created) INVALID&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;CREATE INDEX ON ONLY&lt;/code&gt; creates an invalid index on the partition parent table and does not create indexes on child partitions.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Blocking behavior of ATTACH index:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After ONLY index creation completes, start another DML explicit transaction in Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1111&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Create index with CONCURRENTLY on child partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202302 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- 202302 partition index created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202304 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- 202304 partition index created
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202301 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---- Creating 202301 partition index, waiting&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;CONCURRENTLY waits for transactions that might use the index to complete. Our explicit transaction only inserted into the 202301 partition, so only this partition&amp;rsquo;s CONCURRENTLY index creation hasn&amp;rsquo;t completed.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Complete the DML explicit transaction in Session 1, wait for the index to finish, then start another transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;commit&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;1111&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:01&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: ATTACH index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202302;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- ATTACH successful
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; idx_datecreated
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.idx_datecreated&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Definition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+------+--------------+---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; yes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;btree, &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1&amp;#34;&lt;/span&gt;, invalid
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: idx_datecreated_202302 &lt;span style="color:#75715e"&gt;-- 202302 child partition index has been attached, index still invalid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: btree
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Attach the remaining child partition indexes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202301;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- ATTACH successful
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202304;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#75715e"&gt;-- ATTACH successful
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After all child partition indexes are attached, the parent table index automatically becomes valid
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; idx_datecreated
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Partitioned &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.idx_datecreated&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;&lt;span style="color:#f92672"&gt;?&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Definition &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+------+--------------+---------+--------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; yes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;btree, &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.lzlpartition1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: idx_datecreated_202301,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_datecreated_202302,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx_datecreated_202304
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: btree&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ATTACH is not blocked by DML and completes immediately. At this point, new partitions created via PARTITION OF will also automatically get the child partition index.&lt;/p&gt;
&lt;p&gt;In summary,&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;CREATE INDEX ON ONLY&lt;/code&gt; requests a &lt;code&gt;ShareLock&lt;/code&gt;, which mutually blocks with the &lt;code&gt;RowExclusiveLock&lt;/code&gt; requested by DML.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; requests a &lt;code&gt;ShareUpdateExclusiveLock&lt;/code&gt;, which does not block the &lt;code&gt;RowExclusiveLock&lt;/code&gt; requested by DML. However, &lt;code&gt;CREATE INDEX CONCURRENTLY&lt;/code&gt; needs to wait for DML transactions to complete before it can finish (CONCURRENTLY can acquire the lock but cannot complete).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ALTER INDEX .. ATTACH PARTITION&lt;/code&gt; requests an &lt;code&gt;AccessShareLock&lt;/code&gt;, which is the lightest lock and does not block the &lt;code&gt;RowExclusiveLock&lt;/code&gt; requested by DML.&lt;/li&gt;
&lt;li&gt;Queries request &lt;code&gt;AccessShareLock&lt;/code&gt;, the lightest lock. Unless DDL requests &lt;code&gt;AccessExclusiveLock&lt;/code&gt; (the heaviest lock), blocking does not occur.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, directly running CREATE INDEX on a partition blocks DML and is not acceptable.
&lt;strong&gt;The correct way to create partition indexes&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use ONLY to create an invalid index on the partition parent table. Fast, but blocks subsequent DML, affects business — watch for long-running transactions.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; IDX_DATECREATED &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ONLY&lt;/span&gt; lzlpartition1(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Use CONCURRENTLY to create indexes on each partition child table. Slow, does not block subsequent DML, does not affect business, but watch for long-running DML transactions to prevent failure.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; concurrently idx_datecreated_202302 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302(date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- ATTACH all indexes. Fast, does not cause business blocking.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_datecreated ATTACH PARTITION idx_datecreated_202302;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Adding Primary Keys and Unique Indexes to Partitioned Tables
 &lt;div id="adding-primary-keys-and-unique-indexes-to-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#adding-primary-keys-and-unique-indexes-to-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;A &amp;ldquo;primary key index&amp;rdquo; is functionally equivalent to &amp;ldquo;unique index + NOT NULL constraint&amp;rdquo; (but there can only be one primary key). Creating unique indexes on partitioned tables can follow the index creation best practices above: ONLY on parent, CONCURRENTLY on children, ATTACH.
However, while primary keys on regular tables support the USING INDEX syntax, partitioned tables currently do not support this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; pk_id_date_created &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; idx_uniq;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;A000: &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#f92672"&gt;/&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;USING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; supported &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; partitioned tables
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: ATExecAddIndexConstraint, tablecmds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;8032&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In other words, you can create a NOT NULL unique index by pre-creating a NOT NULL constraint + ATTACH-ing indexes, but the final step of USING INDEX to add the primary key does not work.&lt;/p&gt;
&lt;p&gt;Now let&amp;rsquo;s look at the blocking behavior of directly adding/dropping primary keys:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Directly dropping a primary key:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;begin&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;318&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 22:00:00&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------------+---------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;7715&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; beee680a86e1d12790489e9ab4a4351b &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Session 2: Drop primary key, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; lzlpartition1_pkey;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- Session 3: Observe
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+---------------------------+------------+---------------+-------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301_pkey &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Dropping a primary key requests an AccessExclusiveLock, blocking everything.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;Directly adding a primary key:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1 transaction ends; Session 2&amp;#39;s drop primary key completes
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 1 starts another read-only transaction
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 2: Add a primary key on the partitioned table, waits
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;ADD&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;(id, date_created);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Session 3: Observe locks
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; l.locktype,d.datname,r.relname,l.virtualxid,l.transactionid,l.pid,l.&lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt;,l.&lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_locks l &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_database d &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.&lt;span style="color:#66d9ef"&gt;database&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;d.oid &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_class r &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; l.relation&lt;span style="color:#f92672"&gt;=&lt;/span&gt;r.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname &lt;span style="color:#66d9ef"&gt;like&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;%lzlpartition1%&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; locktype &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; virtualxid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; transactionid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;mode&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;granted&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+---------+----------------------+------------+---------------+-------+---------------------+---------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1_202301 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;95016&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessExclusiveLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#75715e"&gt;-- Session adding primary key
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relation &lt;span style="color:#f92672"&gt;|&lt;/span&gt; dbmgr &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlpartition1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;21659&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; AccessShareLock &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Adding a primary key requests an AccessExclusiveLock on the parent table, blocking everything.
Adding an index on a partitioned table is very slow, and a primary key causes subsequent blocking. Currently, there is no low-impact way to add a primary key on a partitioned table. As a workaround, you can consider using the &amp;ldquo;ATTACH unique index + NOT NULL constraint&amp;rdquo; approach; or you may have to schedule a long maintenance window for the partitioned table business and wait for index creation to complete; or use a third-party sync tool to insert data into a partitioned table that already has the primary key.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Adding Partitions to HASH Partitioned Tables
 &lt;div id="adding-partitions-to-hash-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#adding-partitions-to-hash-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;If the new number of partitions is an integer multiple of the old number, we can know which old partition the data in the new partition came from. For example, expanding a 3-partition HASH partitioned table to 6 partitions, we can determine the data source:



&lt;img src="https://lastdba.com/img/csdn/84a32ff4147c.png" alt="" /&gt;&lt;/p&gt;
&lt;p&gt;Although understanding this simple data characteristic is helpful, in practice it may not be very useful, because new HASH partitions are always populated by brute-force INSERT. In terms of operations, going from &amp;ldquo;3→4&amp;rdquo; partitions is no different from &amp;ldquo;3→6&amp;rdquo;.
Mature data sync tools are now widely available. For example, using DTS to insert the table into a new table and then performing a table switch — this results in very short downtime and should be the preferred approach in production.
Below is primarily testing and observing the manual addition of integer-multiple partitions to a HASH partitioned table:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Partition info:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3377&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3354&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; orders_p2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3369&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;pre&gt;&lt;code&gt;2. DETACH partitions:
 Adding 3 more partitions to a 3-partition HASH native partitioned table:
&lt;/code&gt;&lt;/pre&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders DETACH PARTITION orders_p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders DETACH PARTITION orders_p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders DETACH PARTITION orders_p3;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="3"&gt;
&lt;li&gt;RENAME partitions:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p1 &lt;span style="color:#66d9ef"&gt;RENAME&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; bak_orders_p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 &lt;span style="color:#66d9ef"&gt;RENAME&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; bak_orders_p2;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p3 &lt;span style="color:#66d9ef"&gt;RENAME&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; bak_orders_p3;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="4"&gt;
&lt;li&gt;Create 6 HASH partitions on the old table:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p1 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p2 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p3 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p4 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p5 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; orders_p6 PARTITION &lt;span style="color:#66d9ef"&gt;OF&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (MODULUS &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, REMAINDER &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="5"&gt;
&lt;li&gt;View partition info:
Note the function used in the partition constraint:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; orders_p1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;public.orders_p1&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------+-----------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; order_id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: orders &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WITH&lt;/span&gt; (modulus &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, remainder &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Calculate which new partition old partition data should be inserted into.
For example, the old modulus 3, remainder 0 partition&amp;rsquo;s data needs to be split into the modulus 6, remainder 0 and remainder 3 partitions:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1776&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1601&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; bak_orders_p1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;3377&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="6"&gt;
&lt;li&gt;Insert data directly into partition child tables:
You can insert data directly into the corresponding partition child tables rather than through the partition parent table:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p1 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p2 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p3 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p3 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p4 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p5 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; orders_p6 &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; bak_orders_p3 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; satisfies_hash_partition(&lt;span style="color:#e6db74"&gt;&amp;#39;412053&amp;#39;&lt;/span&gt;::oid, &lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;, order_id)&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;true&lt;/span&gt;;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ol start="7"&gt;
&lt;li&gt;Verify data from 3 old partitions has been inserted into 6 new partitions:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; tableoid::regclass,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; orders &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; tableoid::regclass;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;tableoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p3 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1665&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p5 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1678&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1776&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p6 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1689&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p4 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1601&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;orders_p2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1691&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Changing Column Length on Partitioned Tables Rebuilds Indexes
 &lt;div id="changing-column-length-on-partitioned-tables-rebuilds-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#changing-column-length-on-partitioned-tables-rebuilds-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Modifying a column involves three considerations: table rewrite, index rebuild, and statistics loss.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Changing column type or reducing column length rewrites the table.&lt;/li&gt;
&lt;li&gt;Increasing column length only causes statistics loss; an exception is reducing the length (or changing int4 to int8), which rewrites the table.&lt;/li&gt;
&lt;li&gt;Increasing column length does not rebuild indexes, with one exception: increasing column length on a partitioned table rebuilds indexes (if the column has an index).&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For column modifications, refer to the PostgreSQL apprentice.&lt;/p&gt;
&lt;p&gt;Here we mainly test the scenario of &lt;em&gt;increasing column length on a partitioned table&lt;/em&gt;. If an index exists, it may cause transaction blocking on the partitioned table.
Regular table, increasing the length of an indexed column:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create regular table and index
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t111(id int,name varchar(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; t111 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1001&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;abc&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx111 &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t111(name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Index file relfilenode is 417728
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx111&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417728&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Increase column length
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; t111 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Index file relfilenode is still 417728, unchanged. Regular table index was NOT rebuilt.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;idx111&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417728&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Partitioned table, increasing the length of an indexed column:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create an index on the partitioned table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; idx_name &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1(name);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INDEX&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Check the index on one partition
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; lzlpartition1_202301
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;dbmgr.lzlpartition1_202301&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Column&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Collation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Nullable&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Default&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Storage&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Stats target &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; integer &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; character varying(&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt;) &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; extended &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; now() &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plain &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: lzlpartition1 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-02-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzlpartition1_202301_name_idx&amp;#34;&lt;/span&gt; btree (name)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Access&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;method&lt;/span&gt;: heap
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417810&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Increase the indexed column length — partitioned table index is rebuilt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417814&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417800&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Reduce the indexed column length — partitioned table is rewritten
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;609&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;585&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417828&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417825&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Keep the indexed column length the same — partitioned table index is still rebuilt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; name &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301_name_idx&amp;#39;&lt;/span&gt;) idx,pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzlpartition1_202301&amp;#39;&lt;/span&gt;) tbl;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; idx &lt;span style="color:#f92672"&gt;|&lt;/span&gt; tbl 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------+-------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417834&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16398&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;417825&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For regular tables, increasing column length only requires attention to statistics loss (except int to bigint). However, for partitioned tables, when increasing column length, if the column has an index, not only are statistics lost but the index is also rebuilt. Since ALTER COLUMN is an 8-level lock, the index rebuild period causes extended blocking.
Recommendation: first drop the index, modify the column, then rebuild the index using the &amp;ldquo;parent table ONLY + child tables CIC + ATTACH&amp;rdquo; approach.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Partition Table Maintenance Summary
 &lt;div id="partition-table-maintenance-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-maintenance-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;PARTITION OF / DROP TABLE / DETACH require ACCESS EXCLUSIVE locks. ATTACH / DETACH CONCURRENTLY are recommended — they do not cause blocking. For DETACH CONCURRENTLY, watch for existing long-running transactions.&lt;/li&gt;
&lt;li&gt;Before ATTACH-ing a partition, you can pre-create a constraint on the partition. This eliminates the time spent scanning partition data during ATTACH.&lt;/li&gt;
&lt;li&gt;Currently, CIC (CREATE INDEX CONCURRENTLY) is not supported on partitioned tables. You can create partition indexes using the &amp;ldquo;ONLY on parent + CONCURRENTLY on children + ATTACH index&amp;rdquo; approach to reduce business blocking time.&lt;/li&gt;
&lt;li&gt;Partitioned tables do not support the USING INDEX method for creating primary keys.&lt;/li&gt;
&lt;li&gt;Pay attention to the exceptional case of modifying column length on partitioned tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;Partition Table Optimization
 &lt;div id="partition-table-optimization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-table-optimization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Partition Pruning
 &lt;div id="partition-pruning" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-pruning" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Partition Pruning can improve performance for declarative partitioning and is a very important feature for partitioned table optimization. Without partition pruning, queries would scan all partitions. With partition pruning, the optimizer can filter out partitions that don&amp;rsquo;t need to be accessed through the WHERE condition.



&lt;img src="https://lastdba.com/img/csdn/574daf83f7c1.png" alt="Partition pruning" /&gt;
Partition pruning relies on the PARTITION CONSTRAINT (visible with \d+), which means &lt;strong&gt;queries must include partition key conditions&lt;/strong&gt; for pruning to occur. This constraint differs from regular CHECK constraints — it is automatically created when the partition is created.
Partition pruning is controlled by the &lt;code&gt;enable_partition_pruning&lt;/code&gt; parameter, which defaults to on.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without partition pruning, all partitions are accessed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partition_pruning&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With partition pruning enabled, partitions that don&amp;#39;t need to be accessed are excluded
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partition_pruning&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;(The official documentation says pruning happens during execution plan generation, and EXPLAIN would show &amp;ldquo;Subplans Removed.&amp;rdquo; In testing, this isn&amp;rsquo;t always the case, as in the EXPLAIN example above.)
&lt;strong&gt;Partition pruning can occur at two stages: during execution plan generation, and during actual execution.&lt;/strong&gt;
Why does this happen? Because sometimes only at execution time can we know which partitions can be pruned. There are two scenarios:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Parameterized Nested Loop Joins: The parameter from the outer side of the
join can be used to determine the minimum set of inner side partitions to
scan.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Initplans: Once an initplan has been executed we can then determine which
partitions match the value from the initplan.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Simulating runtime pruning: When fetching data from another table, the optimizer certainly doesn&amp;rsquo;t know what the data is, so it cannot use that as a basis for partition pruning during plan generation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Create another table
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; x(date_created &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; x &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-01 09:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Generate execution plan only, don&amp;#39;t execute — no pruning occurred
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; x);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; x (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2260&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Execute the SQL — pruning occurred. Notice the &amp;#34;never executed&amp;#34; keyword.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;(&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; x);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1904&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;680&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;682&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; InitPlan &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; x (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2260&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;013&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;014&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;07&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;029&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;676&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;008&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;652&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;45382&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (never executed)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) (never executed)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;157&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;732&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 class="relative group"&gt;Partition Wise Join
 &lt;div id="partition-wise-join" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-wise-join" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Partition wise join can reduce the cost of partition joins.
Suppose there are two partitioned tables t1 and t2, both with 3 partitions (p1, p2, p3) with identical partition definitions. t1 has 10 rows per partition, t2 has 20 rows per partition:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;&lt;/th&gt;
 &lt;th&gt;t1&lt;/th&gt;
 &lt;th&gt;t2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;p1&lt;/td&gt;
 &lt;td&gt;10 rows&lt;/td&gt;
 &lt;td&gt;20 rows&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;p2&lt;/td&gt;
 &lt;td&gt;10 rows&lt;/td&gt;
 &lt;td&gt;20 rows&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;p3&lt;/td&gt;
 &lt;td&gt;10 rows&lt;/td&gt;
 &lt;td&gt;20 rows&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;When t1 and t2 join,&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;Normally, all data from both partitioned tables needs to be extracted for joining. The number of row comparison operations would be:
(10+10+10)*(20+20+20)=180&lt;/li&gt;
&lt;li&gt;With partition wise join, since the structures are similar, only corresponding partitions need to be joined, e.g.:
t1.p1&amp;lt;=&amp;gt;t2.p1,
t1.p2&amp;lt;=&amp;gt;t2.p2,
t1.p3&amp;lt;=&amp;gt;t2.p3,
The number of row comparison operations becomes:
(10*20)*3=90&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When there are many partitions, the cost savings of partition wise join are significant.
Parameter &lt;code&gt;enable_partitionwise_join&lt;/code&gt;: whether to enable partition wise join, default is off.&lt;/p&gt;
&lt;p&gt;The prerequisites for partition wise join are very strict:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The join condition must include the partition key.&lt;/li&gt;
&lt;li&gt;The partition keys must be of the same data type.&lt;/li&gt;
&lt;li&gt;Partitions must correspond one-to-one.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While these conditions seem strict, it&amp;rsquo;s relatively rare for tables with different purposes to produce partition wise join scenarios. A common case would be both tables using RANGE time partitioning. Another scenario: a partitioned table self-joining also meets partition wise join prerequisites:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without partition wise join enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; p1.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,p2.name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 p1,lzlpartition1 p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; p1.date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;p2.date_created &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; p2.name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;546&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;64&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;9256&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;182252&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2085&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;85364&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;541&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;541&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;427&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;541&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;427&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p2_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p2_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p2_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With partition wise join enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partitionwise_join &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; p1.&lt;span style="color:#f92672"&gt;*&lt;/span&gt;,p2.name &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 p1,lzlpartition1 p2 &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; p1.date_created&lt;span style="color:#f92672"&gt;=&lt;/span&gt;p2.date_created &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; p2.name&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;287&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2529&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;438&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;287&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1338&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;232&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1_1.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2_1.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 p2_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;284&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;7&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;227&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;250&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;99&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1166&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;55&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;202&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1_2.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2_2.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 p2_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;248&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;52&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;198&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;37&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;60&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;288&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (p1_3.date_created &lt;span style="color:#f92672"&gt;=&lt;/span&gt; p2_3.date_created)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;150&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap Heap Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 p2_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;6&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;35&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;146&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Recheck&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Bitmap &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304_name_idx (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: ((name)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;256ac66bb53d31bc6124294238d6410c&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;25&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without partition wise join enabled, the optimizer first accesses all partition data from p2 (matching the filter) and combines them (Append), then Hash Joins with all partition data from p1 through the partition key.
With partition wise join enabled, the optimizer joins corresponding partitions from p1 and p2 (actually the same table accessed twice):
p1_1&amp;lt;=&amp;gt;p2_1 Hash Join
p1_2&amp;lt;=&amp;gt;p2_2 Hash Join
p1_3&amp;lt;=&amp;gt;p2_3 Hash Join
Then combines the data together (Append).
If there are enough data partitions, combined with partition pruning, partition wise join can have very good optimization effects.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Partition Wise Grouping/Aggregation
 &lt;div id="partition-wise-groupingaggregation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#partition-wise-groupingaggregation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;When performing aggregation on partitioned data, partitions can each compute independently — there is no need to scan all partition data for aggregation. Each partition computes its own aggregation, then the results are collected and returned.
Without partition wise grouping, it&amp;rsquo;s essentially &amp;ldquo;&lt;strong&gt;scan all partitions first, then aggregate&lt;/strong&gt;.&amp;rdquo; With partition wise grouping, it&amp;rsquo;s &amp;ldquo;&lt;strong&gt;aggregate per partition first, then combine results&lt;/strong&gt;.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Advantages of partition wise grouping:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When partitions are on foreign servers, the aggregation operator can be pushed down to the foreign server.&lt;/li&gt;
&lt;li&gt;When aggregating into hash tables, each partition rather than the entire table uses the memory hash table space, reducing memory usage.&lt;/li&gt;
&lt;li&gt;Aggregation algorithms pushed down to individual partitions can better utilize features like indexes and parallelism.&lt;/li&gt;
&lt;li&gt;Fewer data comparisons. Although data scanning is the same, there are fewer data comparisons — for example, data from the last partition does not need to be compared with data from the first partition.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Parameter &lt;code&gt;enable_partitionwise_aggregate&lt;/code&gt;: whether to enable partition wise grouping/aggregation, default is off.&lt;/p&gt;
&lt;p&gt;Partition wise aggregate example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt;) lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Without wise agg
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partitionwise_aggregate &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;off&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created,&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(id),&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10354&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;94&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10562&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;89&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83180&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created, (&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(lzlpartition1.id)), (&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2725&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3557&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;49&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83180&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2085&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;46&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;85364&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_3 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- With wise agg enabled
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; enable_partitionwise_aggregate &lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SET&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; date_created,&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(id),&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; date_created &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;10356&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;08&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10564&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;32&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83296&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created, (&lt;span style="color:#66d9ef"&gt;min&lt;/span&gt;(lzlpartition1.id)), (&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1219&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;3548&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;83296&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1219&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1663&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;09&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;44387&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1061&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;77&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1448&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;86&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;38709&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_1.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;19&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;88&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_2.date_created
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Without partition wise aggregate: first scan all data then combine (Append), then aggregate (HashAggregate).
With partition wise aggregate: first aggregate on each partition (HashAggregate), then combine results (Append).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Partial Aggregation&lt;/strong&gt;
The aggregation algorithm can be pushed down to partitions for computation. At this point, the aggregated results fall into two categories: non-duplicate aggregation data (GROUP BY includes the partition key), and duplicate aggregation data (GROUP BY does not include the partition key).
When aggregation data is non-duplicate, simply appending the per-partition computed aggregation data is sufficient (as in the example above). When per-partition aggregation data has duplicates, an additional aggregation step (Finalize Aggregate) is needed. Aggregation that does not include the partition key is partial aggregation.&lt;/p&gt;
&lt;p&gt;Partial aggregation example:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- When GROUP BY is not the partition key
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; enable_partitionwise_aggregate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; enable_partitionwise_aggregate 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id,&lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; id ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Finalize HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2474&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2573&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9900&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2377&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;19467&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;76&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1202&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9652&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;962&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;95&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1059&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;9615&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_1.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; HashAggregate (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;75&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;200&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: lzlpartition1_2.id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When GROUP BY does not include the partition key, aggregation can still be performed, but a subsequent Finalize HashAggregate is required.&lt;/p&gt;
&lt;p&gt;Even without GROUP BY, Partial Aggregate can still occur:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; enable_partitionwise_aggregate;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; enable_partitionwise_aggregate 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;count&lt;/span&gt;(&lt;span style="color:#f92672"&gt;*&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Finalize &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;max&lt;/span&gt;(date_created) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlpartition1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Finalize &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1872&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;992&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;31&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202301 lzlpartition1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;878&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;45384&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;12&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;864&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202302 lzlpartition1_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;765&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;39530&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Partial&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Aggregate&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;62&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;63&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlpartition1_202304 lzlpartition1_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;450&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The precondition for triggering Partial Aggregate is not GROUP BY. We should think from the purpose of Partial Aggregate — it aims to push aggregation down to partitions. Aggregation without GROUP BY can also be done this way, as shown in the two examples above: they both compute aggregation on each partition first (Partial Aggregate), then combine and aggregate once more (Finalize Aggregate). Without the parameter enabled, these aggregations would occur after scanning all partitions.&lt;/p&gt;

&lt;h2 class="relative group"&gt;History of Partitioned Tables
 &lt;div id="history-of-partitioned-tables" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#history-of-partitioned-tables" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Declarative partitioning has gone through many version enhancements and is now very mature. Here&amp;rsquo;s a summary of declarative partitioning feature enhancements across PostgreSQL versions:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pre-PG9.6&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Only inheritance tables could implement partitioning functionality.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG10&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Declarative partitioning supported.&lt;/li&gt;
&lt;li&gt;RANGE and LIST partitioning supported.&lt;/li&gt;
&lt;li&gt;ATTACH/DETACH table partitions supported.&lt;/li&gt;
&lt;li&gt;Partition pruning supported.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG11&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Added HASH partition support.&lt;/li&gt;
&lt;li&gt;Support for creating primary keys, foreign keys, indexes, and triggers.&lt;/li&gt;
&lt;li&gt;Support for updating partition key; automatic creation of indexes on partitions.&lt;/li&gt;
&lt;li&gt;Support for DEFAULT partition.&lt;/li&gt;
&lt;li&gt;Support for ATTACH index.&lt;/li&gt;
&lt;li&gt;Support for FOR EACH ROW triggers, automatically created on existing and future child partitions.&lt;/li&gt;
&lt;li&gt;New &lt;code&gt;enable_partition_pruning&lt;/code&gt; parameter; pruning enhancements.&lt;/li&gt;
&lt;li&gt;Support for partition wise join.&lt;/li&gt;
&lt;li&gt;Support for partition wise aggregation.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG12&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced query, insert, pruning, and COPY performance.&lt;/li&gt;
&lt;li&gt;Support for foreign key constraints referencing partitioned tables.&lt;/li&gt;
&lt;li&gt;Support for non-blocking partition ATTACH: &lt;code&gt;ALTER TABLE ATTACH PARTITION&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG13&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced pruning.&lt;/li&gt;
&lt;li&gt;Enhanced partition wise join.&lt;/li&gt;
&lt;li&gt;Support for BEFORE triggers.&lt;/li&gt;
&lt;li&gt;Support for publishing partitioned tables; support for subscribing and writing to partitioned tables.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG14&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced UPDATE and DELETE performance.&lt;/li&gt;
&lt;li&gt;Support for non-blocking partition DETACH: &lt;code&gt;ALTER TABLE ... DETACH PARTITION ... CONCURRENTLY&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Support for REINDEX on partitioned table indexes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG15&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced execution plan generation, reducing generation time with many partitions.&lt;/li&gt;
&lt;li&gt;Enhanced sorting.&lt;/li&gt;
&lt;li&gt;Support for CLUSTER on partitioned tables.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;PG16&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Enhanced GENERATED column restrictions: if the parent table has a generated column, child partitions must also include it.&lt;/li&gt;
&lt;li&gt;Enhanced lookup for RANGE and LIST partitions.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;《PostgreSQL修炼之道》&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/NW8XOZNq0YlDZvx24H737Q" target="_blank" rel="noreferrer"&gt;https://mp.weixin.qq.com/s/NW8XOZNq0YlDZvx24H737Q&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/ddl-partitioning.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/ddl-partitioning.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/current/ddl-inherit.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/current/ddl-inherit.html&lt;/a&gt;
&lt;a href="https://www.postgresql.org/docs/13/sql-altertable.html" target="_blank" rel="noreferrer"&gt;https://www.postgresql.org/docs/13/sql-altertable.html&lt;/a&gt;
&lt;a href="https://github.com/postgrespro/pg_pathman" target="_blank" rel="noreferrer"&gt;https://github.com/postgrespro/pg_pathman&lt;/a&gt;
&lt;a href="https://developer.aliyun.com/article/62314" target="_blank" rel="noreferrer"&gt;https://developer.aliyun.com/article/62314&lt;/a&gt;
&lt;a href="https://hevodata.com/learn/postgresql-partitions" target="_blank" rel="noreferrer"&gt;https://hevodata.com/learn/postgresql-partitions&lt;/a&gt;
&lt;a href="https://www.postgresql.fastware.com/postgresql-insider-prt-ove" target="_blank" rel="noreferrer"&gt;https://www.postgresql.fastware.com/postgresql-insider-prt-ove&lt;/a&gt;
&lt;a href="https://www.buckenhofer.com/2021/01/postgresql-partitioning-guide/" target="_blank" rel="noreferrer"&gt;https://www.buckenhofer.com/2021/01/postgresql-partitioning-guide/&lt;/a&gt;
&lt;a href="https://www.depesz.com/2018/05/01/waiting-for-postgresql-11-support-partition-pruning-at-execution-time/" target="_blank" rel="noreferrer"&gt;https://www.depesz.com/2018/05/01/waiting-for-postgresql-11-support-partition-pruning-at-execution-time/&lt;/a&gt;
&lt;a href="https://blog.csdn.net/horses/article/details/86164273" target="_blank" rel="noreferrer"&gt;https://blog.csdn.net/horses/article/details/86164273&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="http://www.pgsql.tech/article_0_10000102" target="_blank" rel="noreferrer"&gt;http://www.pgsql.tech/article_0_10000102&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://brandur.org/fragments/postgres-partitioning-2022" target="_blank" rel="noreferrer"&gt;https://brandur.org/fragments/postgres-partitioning-2022&lt;/a&gt;&lt;/p&gt;</content:encoded></item><item><title>Some Features of PostgreSQL Logical Replication</title><link>https://lastdba.com/en/2024/08/12/some-features-of-postgresql-logical-replication/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/some-features-of-postgresql-logical-replication/</guid><description>&lt;p&gt;I&amp;rsquo;ve already written a fairly detailed &lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;article about logical replication&lt;/a&gt; before, so I won&amp;rsquo;t repeat the basics here. However, some knowledge points inevitably get missed. Recently I&amp;rsquo;ve discovered some interesting logical replication features.&lt;/p&gt;

&lt;h2 class="relative group"&gt;replica identity and old/new values
 &lt;div id="replica-identity-and-oldnew-values" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity-and-oldnew-values" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;replica identity is used to identify a row during logical replication.
The above statement is certainly correct, but it doesn&amp;rsquo;t explain the changes in old and new data.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;code&gt;DEFAULT&lt;/code&gt;
Records the old values of the columns of the primary key, if any. This is the default for non-system tables.
&lt;code&gt;USING INDEX&lt;/code&gt; index_name
Records the old values of the columns covered by the named index, that must be unique, not partial, not deferrable, and include only columns marked &lt;code&gt;NOT NULL&lt;/code&gt;. If this index is dropped, the behavior is the same as &lt;code&gt;NOTHING&lt;/code&gt;.
&lt;code&gt;FULL&lt;/code&gt;
Records the old values of all columns in the row.
&lt;code&gt;NOTHING&lt;/code&gt;
Records no information about the old row. This is the default for system tables.&lt;/p&gt;</description><content:encoded>&lt;p&gt;I&amp;rsquo;ve already written a fairly detailed &lt;a href="https://blog.csdn.net/qq_40687433/article/details/129291207" target="_blank" rel="noreferrer"&gt;article about logical replication&lt;/a&gt; before, so I won&amp;rsquo;t repeat the basics here. However, some knowledge points inevitably get missed. Recently I&amp;rsquo;ve discovered some interesting logical replication features.&lt;/p&gt;

&lt;h2 class="relative group"&gt;replica identity and old/new values
 &lt;div id="replica-identity-and-oldnew-values" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#replica-identity-and-oldnew-values" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;replica identity is used to identify a row during logical replication.
The above statement is certainly correct, but it doesn&amp;rsquo;t explain the changes in old and new data.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;&lt;code&gt;DEFAULT&lt;/code&gt;
Records the old values of the columns of the primary key, if any. This is the default for non-system tables.
&lt;code&gt;USING INDEX&lt;/code&gt; index_name
Records the old values of the columns covered by the named index, that must be unique, not partial, not deferrable, and include only columns marked &lt;code&gt;NOT NULL&lt;/code&gt;. If this index is dropped, the behavior is the same as &lt;code&gt;NOTHING&lt;/code&gt;.
&lt;code&gt;FULL&lt;/code&gt;
Records the old values of all columns in the row.
&lt;code&gt;NOTHING&lt;/code&gt;
Records no information about the old row. This is the default for system tables.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The PG official documentation only explains the old value situation for replica identity — for example, it doesn&amp;rsquo;t even mention that NOTHING won&amp;rsquo;t replicate update/delete. This shows the importance of old values.&lt;/p&gt;
&lt;p&gt;Creating a replication link:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;pubtestlzl2&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#75715e"&gt;--slot=pubtestlzl2 --start -f recv.sql &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Normal test_decoding replication link simulation:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--replica identity defaults to d: uses primary key when available; without primary key, defaults to nothing, unable to replicate update and delete
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzltest(a bigint &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzltest &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;bbbbbb&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;ccccccccc&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzltest &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;recvlogical output:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.lzltest: &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;: a[bigint]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;bbbbbb&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;ccccccccc&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.lzltest: &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;: a[bigint]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;ccccccccc&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With replica identity as default, updating a non-primary-key field — all fields have only new values.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzltest &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;111&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.lzltest: &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;old&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: a[bigint]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;new&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;tuple: a[bigint]:&lt;span style="color:#ae81ff"&gt;111&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;bb&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;ccccccccc&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With replica identity as default, updating the primary key — the identity column&amp;rsquo;s old and new values are decoded; other fields only have new values.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzltest replica &lt;span style="color:#66d9ef"&gt;identity&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;full&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;M&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzltest &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.lzltest: &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;: &lt;span style="color:#66d9ef"&gt;old&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: a[bigint]:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;ccccccccc&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;new&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;tuple: a[bigint]:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;b&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;ccccccccc&amp;#39;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With replica identity set to full, both old and new values for the entire row are preserved.&lt;/p&gt;
&lt;p&gt;Whether in default (primary key) or full mode, all column information is recorded. The difference lies in whether old data is present. In default mode:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;insert: inherently new data, so naturally no old values — all column new values are recorded.&lt;/li&gt;
&lt;li&gt;update: records new values for all columns; &lt;strong&gt;only the identity column has old values (if the identity column was updated)&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;delete: inherently old data, but not all columns are necessarily recorded. The same rule applies: &lt;em&gt;only the identity column has old values&lt;/em&gt; — only the identity column is recorded.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Summary: When replica identity is default, regardless of the operation (INSERT, UPDATE, DELETE), as long as it&amp;rsquo;s old data, only the identity column is recorded; as long as it&amp;rsquo;s new data, all columns are recorded.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;When changing from default to full, the decoded log volume difference isn&amp;rsquo;t particularly large, because new data always includes all columns. (Excluding scenarios that are &lt;em&gt;entirely&lt;/em&gt; deletes) the log volume decoded under full is less than twice that of default.&lt;/p&gt;

&lt;h2 class="relative group"&gt;pgoutput cannot be peeked
 &lt;div id="pgoutput-cannot-be-peeked" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pgoutput-cannot-be-peeked" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Create a replication slot using pgoutput:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;pubtestlzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;pgoutput&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then try to peek or receive — both fail:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;pubtestlzl&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#75715e"&gt;--slot=pubtestlzl --start -f recv.sql &amp;amp;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical: error: could not send replication command &lt;span style="color:#e6db74"&gt;&amp;#34;START_REPLICATION SLOT &amp;#34;&lt;/span&gt;pubtestlzl&lt;span style="color:#e6db74"&gt;&amp;#34; LOGICAL 0/0&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;:&lt;/span&gt; ERROR: client sent proto_version&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; but we only support protocol &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; or higher
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CONTEXT: slot &lt;span style="color:#e6db74"&gt;&amp;#34;pubtestlzl&amp;#34;&lt;/span&gt;, output plugin &lt;span style="color:#e6db74"&gt;&amp;#34;pgoutput&amp;#34;&lt;/span&gt;, in the startup callback
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_recvlogical: disconnected; waiting &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; seconds to try again&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You cannot peek or use pg_recvlogical to receive from a pgoutput replication slot. Since pgoutput is the output plugin for publish-subscribe, this plugin cannot be manually peeked or received&amp;hellip;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Publish-Subscribe Doesn&amp;rsquo;t Have to Be PG-to-PG
 &lt;div id="publish-subscribe-doesnt-have-to-be-pg-to-pg" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#publish-subscribe-doesnt-have-to-be-pg-to-pg" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;create publication&lt;/code&gt; and &lt;code&gt;create subscription&lt;/code&gt; are PG internal commands that can also be used to create links between PG databases.
Third-party software can similarly use create publication and simulate subscriptions to create replication slots. This is better than directly creating replication slots because publications can manage replicated tables.&lt;/p&gt;

&lt;h2 class="relative group"&gt;TOAST and Logical Decoding
 &lt;div id="toast-and-logical-decoding" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#toast-and-logical-decoding" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;TOAST columns being sent are NOT decoded! This means an entire row of data may only have part of it transmitted (when TOAST columns themselves haven&amp;rsquo;t been updated).&lt;/p&gt;
&lt;p&gt;Normal decoding decodes all columns:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create a test_decoding replication slot
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_dest&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;test_decoding&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_create_logical_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (logical_dest,&lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A80040E0)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Create a table with small columns
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; test1(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;,b varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;),&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;&lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_replication_slots;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; slot_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; plugin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; slot_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; datoid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active_pid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; catalog_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; restart_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; confirmed_flush_lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; wal_status &lt;span style="color:#f92672"&gt;|&lt;/span&gt; safe_wal_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+---------------+-----------+--------+----------+-----------+--------+------------+--------+--------------+--------------+---------------------+------------+---------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; logical_dest &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test_decoding &lt;span style="color:#f92672"&gt;|&lt;/span&gt; logical &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;418679&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; f &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;] &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A80040A8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A80040E0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reserved &lt;span style="color:#f92672"&gt;|&lt;/span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;915&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_dest&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------+--------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8004C78 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A80103E8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8018B30 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8018B30 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.test1: &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;: a[integer]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8018C50 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--insert is decoded, containing all columns
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zxcv&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;005&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_dest&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------+--------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8004C78 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A80103E8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483335&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8018B30 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8018B30 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.test1: &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;: a[integer]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8018C50 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483369&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A801D018 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483378&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483378&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A801D018 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483378&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.test1: &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;: a[integer]:&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;zxcv&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;qwer&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A801D098 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483378&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483378&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--update is decoded, containing all columns&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Normally, without TOAST, decoded data includes all columns of the row.&lt;/p&gt;
&lt;p&gt;TOAST decoding test:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Enlarge the columns
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; b &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;091&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;column&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;ALTER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;937&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--A batch random function
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;or&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;replace&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;function&lt;/span&gt; f_random_str(&lt;span style="color:#66d9ef"&gt;length&lt;/span&gt; INTEGER) &lt;span style="color:#66d9ef"&gt;returns&lt;/span&gt; character varying
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LANGUAGE&lt;/span&gt; plpgsql
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DECLARE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;result&lt;/span&gt; varchar(&lt;span style="color:#ae81ff"&gt;3000&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; array_to_string(ARRAY(&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; chr((&lt;span style="color:#ae81ff"&gt;65&lt;/span&gt; &lt;span style="color:#f92672"&gt;+&lt;/span&gt; round(random() &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;25&lt;/span&gt;)) :: integer)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;length&lt;/span&gt;)), &lt;span style="color:#e6db74"&gt;&amp;#39;&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;INTO&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;result&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;result&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;END&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;$$&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FUNCTION&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Insert data
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,f_random_str(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;),f_random_str(&lt;span style="color:#ae81ff"&gt;2000&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Check for TOAST
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; n.nspname &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt;, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; s.oid::regclass &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; relname, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; s.reltoastrelid::regclass &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; toast_name, 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; pg_relation_size(s.reltoastrelid) &lt;span style="color:#66d9ef"&gt;AS&lt;/span&gt; toast_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; pg_class s &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; pg_namespace n 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; s.relnamespace &lt;span style="color:#f92672"&gt;=&lt;/span&gt; n.oid 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;WHERE&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; relkind &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;r&amp;#39;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; reltoastrelid &lt;span style="color:#f92672"&gt;&amp;lt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; n.nspname &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;public&amp;#39;&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ORDER&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DESC&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; toast_name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; toast_size 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+---------+--------------------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; test1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; pg_toast.pg_toast_418714 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--Update via primary key, updating a TOAST column
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zxcv&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_dest&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A851FD90 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483420&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483420&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A85216E0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483420&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.test1: &lt;span style="color:#66d9ef"&gt;INSERT&lt;/span&gt;: a[integer]:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;GIORCXQQWDBGTUNDZXAWMPYOUEGTECWTVQGDQGSPMEPJNPUQIFMESLRASBZWGONETRENDCHLDWVTDWJLTGRYUMFDOWHLEYLUTECPOVCYXFIATLKVEQTHSC&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A85218A0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483420&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483420&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8525CA8 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483429&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BEGIN&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483429&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8525D50 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483429&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;.test1: &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt;: a[integer]:&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; b[character varying]:&lt;span style="color:#e6db74"&gt;&amp;#39;zxcv&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;[character varying]:unchanged&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;toast&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;datum
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8525DE0 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483429&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;COMMIT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483429&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Column c, which has TOAST and was not involved in the update, has no decoded data — it directly outputs toast datum unchanged: &lt;code&gt;unchanged-toast-datum&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Testing with wal2json:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_create_logical_replication_slot(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_json&amp;#39;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#39;wal2json&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_create_logical_replication_slot 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (logical_json,&lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A87CAB58)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zxcv&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;pset format wrapped
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;Output&lt;/span&gt; format &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; wrapped.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;pset columns &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Target width &lt;span style="color:#66d9ef"&gt;is&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;200&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_json&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A87CACF8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483495&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;change&amp;#34;&lt;/span&gt;:[&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;kind&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;update&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;schema&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;public&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;table&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;test1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;columnnames&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;b&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;columntypes&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;integer&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;character varying(3000)&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;columnvalues&amp;#34;&lt;/span&gt;:[&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;zxcv&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;oldkeys&amp;#34;&lt;/span&gt;:&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;keynames&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;],.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;keytypes&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;integer&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;keyvalues&amp;#34;&lt;/span&gt;:[&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;}}&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; test1 &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt; b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;zxcv&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;391&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_logical_slot_peek_changes(&lt;span style="color:#e6db74"&gt;&amp;#39;logical_json&amp;#39;&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A87CACF8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483495&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;change&amp;#34;&lt;/span&gt;:[&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;kind&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;update&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;schema&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;public&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;table&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;test1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;columnnames&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;b&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;columntypes&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;integer&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;character varying(3000)&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;columnvalues&amp;#34;&lt;/span&gt;:[&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;zxcv&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;oldkeys&amp;#34;&lt;/span&gt;:&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;keynames&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;],.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;keytypes&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;integer&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;keyvalues&amp;#34;&lt;/span&gt;:[&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;}}&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;[ RECORD &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; ]&lt;span style="color:#75715e"&gt;------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lsn &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;349&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;A8CCA0D8
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;xid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;872483509&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;data&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;change&amp;#34;&lt;/span&gt;:[&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;kind&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;update&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;schema&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;public&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;table&amp;#34;&lt;/span&gt;:&lt;span style="color:#e6db74"&gt;&amp;#34;test1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;columnnames&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;b&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;c&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;columntypes&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;integer&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;character varying(3000)&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;character varying(3000)&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;columnvalues&amp;#34;&lt;/span&gt;:[&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;zxcv&amp;#34;&lt;/span&gt;.
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;.,&lt;span style="color:#e6db74"&gt;&amp;#34;qwer&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;oldkeys&amp;#34;&lt;/span&gt;:&lt;span style="color:#960050;background-color:#1e0010"&gt;{&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;keynames&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;a&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;keytypes&amp;#34;&lt;/span&gt;:[&lt;span style="color:#e6db74"&gt;&amp;#34;integer&amp;#34;&lt;/span&gt;],&lt;span style="color:#e6db74"&gt;&amp;#34;keyvalues&amp;#34;&lt;/span&gt;:[&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;}}&lt;/span&gt;]&lt;span style="color:#960050;background-color:#1e0010"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;--When updating, column c data is not decoded&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;wal2json shows the same behavior.&lt;/p&gt;
&lt;p&gt;MySQL&amp;rsquo;s &lt;a href="https://dev.mysql.com/doc/refman/8.0/en/replication-options-binary-log.html#sysvar_binlog_row_image" target="_blank" rel="noreferrer"&gt;&lt;code&gt;binlog_row_image&lt;/code&gt;&lt;/a&gt; parameter can adjust whether binlog records large fields:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;full&lt;/code&gt; (Log all columns)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;minimal&lt;/code&gt; (Log only changed columns, and columns needed to identify rows)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;noblob&lt;/code&gt; (Log all columns, except for unneeded BLOB and TEXT columns)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;PG has absolutely no such control — by default, TOAST columns are not decoded, and there are no other options to configure~&lt;/p&gt;</content:encoded></item><item><title>The Table I Wanted to Query Was Not in the Execution Plan</title><link>https://lastdba.com/en/2024/08/12/the-table-i-wanted-to-query-was-not-in-the-execution-plan/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/the-table-i-wanted-to-query-was-not-in-the-execution-plan/</guid><description>&lt;h2 class="relative group"&gt;Problem: The Queried Table Did Not Appear in the Execution Plan
 &lt;div id="problem-the-queried-table-did-not-appear-in-the-execution-plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-the-queried-table-did-not-appear-in-the-execution-plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- many A columns omitted in between
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column99 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column99&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a AA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;inner&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; table_b BB &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; AA.lzl_key &lt;span style="color:#f92672"&gt;=&lt;/span&gt; BB.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AA.column_code &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) B &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; B.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; A.lzl_key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.flagflagflag &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; A.typetypetype &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) TEMP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem: The Queried Table Did Not Appear in the Execution Plan
 &lt;div id="problem-the-queried-table-did-not-appear-in-the-execution-plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-the-queried-table-did-not-appear-in-the-execution-plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- many A columns omitted in between
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column99 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column99&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a AA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;inner&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; table_b BB &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; AA.lzl_key &lt;span style="color:#f92672"&gt;=&lt;/span&gt; BB.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AA.column_code &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) B &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; B.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; A.lzl_key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.flagflagflag &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; A.typetypetype &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) TEMP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;68&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;038&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;039&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1105&lt;/span&gt;) (actual time&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;036&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;037&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; loops&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((flagflagflag)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((typetypetype)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; Removed &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; Filter: &lt;span style="color:#ae81ff"&gt;38&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Buffers: shared hit&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Planning Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;184&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Execution Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;066&lt;/span&gt; ms&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As you can see, the SQL itself is fairly complex. Logically, the SQL queries 3 tables / accesses 2 tables total. I can understand &lt;code&gt;table_a&lt;/code&gt; appearing in the execution plan, but &lt;code&gt;table_b&lt;/code&gt;, which needed to be queried, wasn&amp;rsquo;t in the execution plan at all! The execution plan was simply a sequential scan of &lt;code&gt;table_a&lt;/code&gt;.&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Analytical Journey
 &lt;div id="the-analytical-journey" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-analytical-journey" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;In the middle of the analysis, I actually considered many possibilities, but the most likely one was logical optimization — that is, the PostgreSQL optimizer determined that &lt;code&gt;table_b&lt;/code&gt; didn&amp;rsquo;t need to be queried.&lt;/p&gt;
&lt;p&gt;Observing the SQL, I noticed that the final query only selected columns from &lt;code&gt;table_a&lt;/code&gt;, without any columns from &lt;code&gt;table_b&lt;/code&gt;. Adding any column from the intermediate table B made the SQL execution plan appear &amp;ldquo;normal&amp;rdquo; — it accessed &lt;code&gt;table_b&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column1 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column1&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#75715e"&gt;-- many A columns omitted in between
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.column99 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;column99&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; B.lzl_id &lt;span style="color:#75715e"&gt;-- added a column from intermediate table B
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a A
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; table_a AA
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;inner&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; table_b BB &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; AA.lzl_key &lt;span style="color:#f92672"&gt;=&lt;/span&gt; BB.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; AA.column_code &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) B &lt;span style="color:#66d9ef"&gt;ON&lt;/span&gt; B.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; A.lzl_key
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; A.flagflagflag &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; A.typetypetype &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ) TEMP
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;offset&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;17&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;67&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1113&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Nested Loop &lt;span style="color:#66d9ef"&gt;Left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;69&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1113&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; Filter: (bb.lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; a.lzl_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a a (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;84&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1113&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: (((flagflagflag)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((typetypetype)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2&amp;#39;&lt;/span&gt;::text))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;74&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: bb.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Sort (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;72&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;73&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Sort &lt;span style="color:#66d9ef"&gt;Key&lt;/span&gt;: bb.lzl_id
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Nested Loop (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;11&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;66&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_a aa (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;70&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filter: ((company_code)::text &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;1&amp;#39;&lt;/span&gt;::text)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; idx_table_b_lzl_id &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; table_b bb (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;83&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (lzl_id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; aa.lzl_key)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This seems related to LEFT JOIN, but a quick thought makes it seem incorrect — after all, the results from the right table should affect the final query result, so the right table shouldn&amp;rsquo;t be skipped. Let&amp;rsquo;s try a simple LEFT JOIN:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;lzlright.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash &lt;span style="color:#66d9ef"&gt;Left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;04&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;15&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;47&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzlleft.a &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lzlright.a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlright (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The right table is scanned. But, in intermediate table B, there&amp;rsquo;s the keyword &lt;code&gt;GROUP BY&lt;/code&gt;. If we remove &lt;code&gt;GROUP BY&lt;/code&gt;, then &lt;code&gt;table_b&lt;/code&gt; is accessed regardless of whether we query columns from B.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s add a GROUP BY in our test table and see the result:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-----
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; zzz
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;259&lt;/span&gt; ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---+-------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; qwer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; poiuy 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlright.b &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;full&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.b&lt;span style="color:#f92672"&gt;=&lt;/span&gt;lzlright.b &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lzlright.b;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; b 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; [&lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;]
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; poiuy
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; qwer
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is where I realized that the result set from GROUP BY must have a certain property — &lt;strong&gt;uniqueness&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s add GROUP BY in the test table:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The right table is not queried!&lt;/p&gt;
&lt;p&gt;Based on the principle of right-table uniqueness, we can also have some fun variations:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- distinct ensures right-table uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;13&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;320&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;) &lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-- unique index ensures right-table uniqueness, even with just select a from lzlright
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; explain select lzlleft.a from lzlleft left join (select a from lzlright) c on lzlleft.a=c.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;-----------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Left Join (cost=17.20..49.12 rows=512 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzlleft.a = lzlright.a)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on lzlleft (cost=0.00..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Hash (cost=13.20..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; -&amp;gt; Seq Scan on lzlright (cost=0.00..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(5 rows)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: 0.510 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; create unique index idx_right on lzlright(a);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;CREATE INDEX
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: 3.576 ms
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&amp;gt; explain select lzlleft.a from lzlleft left join (select a from lzlright) c on lzlleft.a=c.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Seq Scan on lzlleft (cost=0.00..13.20 rows=320 width=4)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(1 row)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Here&amp;rsquo;s a summary of the analysis: when the right table&amp;rsquo;s data is unique and only the left table&amp;rsquo;s data is being queried, there&amp;rsquo;s no need to actually access the right table. So this is not a bug, but a feature of the PostgreSQL optimizer — and it makes logical sense.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;No source code analysis this time~&lt;/p&gt;
&lt;p&gt;The optimizer source code is just too difficult. I only looked at some optimizer source code comments. Search for the keyword &lt;code&gt;unique-ify&lt;/code&gt;, and you&amp;rsquo;ll find this:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; Also, this routine and others in this module accept the special JoinTypes
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; JOIN_UNIQUE_OUTER and JOIN_UNIQUE_INNER to indicate that we should
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; unique&lt;span style="color:#f92672"&gt;-&lt;/span&gt;ify the outer or inner relation and then apply a regular inner
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; join. These values are not allowed to propagate outside this module,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; however. Path cost estimation code may need to recognize that it&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;s
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; dealing with such a &lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; &lt;span style="color:#f92672"&gt;---&lt;/span&gt; the combination of nominal jointype INNER
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; with sjinfo&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;jointype &lt;span style="color:#f92672"&gt;==&lt;/span&gt; JOIN_SEMI indicates that. 
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Special JoinTypes: &lt;code&gt;JOIN_UNIQUE_INNER&lt;/code&gt; and &lt;code&gt;JOIN_UNIQUE_OUTER&lt;/code&gt; — they try to unique-ify the outer and inner relations and then treat them as an inner join. Path cost estimation needs to consider this scenario.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Comparison with Oracle and MySQL Optimizers
 &lt;div id="comparison-with-oracle-and-mysql-optimizers" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#comparison-with-oracle-and-mysql-optimizers" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Let&amp;rsquo;s compare whether Oracle and MySQL optimizers have similar logical optimization improvements.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Oracle
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlleft(a number);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlright(a number);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- GROUP BY uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; selected
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: &lt;span style="color:#ae81ff"&gt;3533354041&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Operation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Cost (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;CPU)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Time &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;STATEMENT&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OUTER&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLLEFT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VIEW&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;GROUP&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLRIGHT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;operation&lt;/span&gt; id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;access&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;LZLLEFT&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;(&lt;span style="color:#f92672"&gt;+&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DISTINCT uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SQL&lt;/span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;no&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; selected
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Execution Plan
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Plan hash value: &lt;span style="color:#ae81ff"&gt;3859658234&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Operation&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Bytes &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Cost (&lt;span style="color:#f92672"&gt;%&lt;/span&gt;CPU)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; Time &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;SELECT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;STATEMENT&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;JOIN&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;OUTER&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;26&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLLEFT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VIEW&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; HASH &lt;span style="color:#66d9ef"&gt;UNIQUE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ACCESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FULL&lt;/span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; LZLRIGHT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;13&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;01&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Predicate Information (identified &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;operation&lt;/span&gt; id):
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;access&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;LZLLEFT&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;C&amp;#34;&lt;/span&gt;.&lt;span style="color:#e6db74"&gt;&amp;#34;A&amp;#34;&lt;/span&gt;(&lt;span style="color:#f92672"&gt;+&lt;/span&gt;))&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- MySQL
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlleft(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlright(a int &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- GROUP BY uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright &lt;span style="color:#66d9ef"&gt;group&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; a) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlleft &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb.lzlleft.a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlright &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- DISTINCT uniqueness
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; lzlleft.a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlleft &lt;span style="color:#66d9ef"&gt;left&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;join&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;distinct&lt;/span&gt; a &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzlright) &lt;span style="color:#66d9ef"&gt;c&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzlleft.a&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;.a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlleft &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived2&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;auto_key0&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzldb.lzlleft.a &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzlright &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;4&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+------------+-------+---------------+-------------+---------+-----------------+------+----------+-------------+&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In summary, neither Oracle nor MySQL performs the optimization of eliminating the right table in a LEFT JOIN when only left-table columns are queried and the right table is unique — they both access the right table.&lt;/p&gt;
&lt;p&gt;The PostgreSQL optimizer really has some impressive tricks.&lt;/p&gt;</content:encoded></item><item><title>Too Many Range Table Entries Even with Not-That-Many Partitions</title><link>https://lastdba.com/en/2024/08/12/too-many-range-table-entries-even-with-not-that-many-partitions/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/too-many-range-table-entries-even-with-not-that-many-partitions/</guid><description>&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL UPDATE statement throws error: &lt;code&gt;too many range table entries&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Original SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If we rewrite UPDATE as SELECT, it succeeds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; 	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id	&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------+...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;161687&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)	&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Primary key and partitions — 400 partitions total:&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Problem Description
 &lt;div id="problem-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#problem-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;PostgreSQL UPDATE statement throws error: &lt;code&gt;too many range table entries&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Original SQL:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If we rewrite UPDATE as SELECT, it succeeds:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt;	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; 	LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt;	id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id	&lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; date_created 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------+----------------------------+...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2023&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;06&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;21&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;161687&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)	&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Primary key and partitions — 400 partitions total:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt;: RANGE (partition_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Indexes:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;pk_lzl&amp;#34;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;KEY&lt;/span&gt;, btree (id, partition_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partitions: lzl_p20230601 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230601&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230602&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_p20230602 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230602&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230603&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl_p20230603 &lt;span style="color:#66d9ef"&gt;FOR&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;FROM&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230603&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;TO&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230604&amp;#39;&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The SQL logic has many optimization opportunities, but we won&amp;rsquo;t discuss those here. The focus is on why UPDATE fails and why SELECT and UPDATE behave differently.&lt;/p&gt;
&lt;p&gt;EXPLAIN UPDATE throws this error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (selec tid &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; LZLTAB &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;54000&lt;/span&gt;: too many range &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; entries
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: add_rte_to_flat_rtable, setrefs.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;451&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Time: &lt;span style="color:#ae81ff"&gt;18341&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;171&lt;/span&gt; ms (&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;18&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;341&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;EXPLAIN took 18 seconds, then threw the error.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Source Code Analysis
 &lt;div id="source-code-analysis" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;The error directly points to the source location: &lt;code&gt;LOCATION: add_rte_to_flat_rtable, setrefs.c:451&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Find the source at &lt;code&gt;src/backend/optimizer/plan/setrefs.c&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The comment explains that setrefs.c handles post-processing of a completed plan tree:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *Post-processing of a completed plan tree: fix references to subplan
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 vars, compute regproc values for operators, etc
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Find the function at line 451:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Add (a copy of) the given RTE to the final rangetable
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In the flat rangetable, we zero out substructure pointers that are not
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * needed by the executor; this reduces the storage space and copying cost
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * for cached plans. We keep only the ctename, alias and eref Alias fields,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * which are needed by EXPLAIN, and the selectedCols, insertedCols,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * updatedCols, and extraUpdatedCols bitmaps, which are needed for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * executor-startup permissions checking and for trigger event checking.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;add_rte_to_flat_rtable&lt;/span&gt;(PlannerGlobal &lt;span style="color:#f92672"&gt;*&lt;/span&gt;glob, RangeTblEntry &lt;span style="color:#f92672"&gt;*&lt;/span&gt;rte)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Check for RT index overflow; it&amp;#39;s very unlikely, but if it did happen,
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * the executor would get confused by varnos that match the special varno
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * values.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;IS_SPECIAL_VARNO&lt;/span&gt;(&lt;span style="color:#a6e22e"&gt;list_length&lt;/span&gt;(glob&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;finalrtable)))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(ERROR,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				(&lt;span style="color:#a6e22e"&gt;errcode&lt;/span&gt;(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				 &lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;too many range table entries&amp;#34;&lt;/span&gt;)));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;errmsg() is at line 451. From the comments, add_rte_to_flat_rtable() is related to RTE. What is RTE? We&amp;rsquo;ll analyze below.&lt;/p&gt;
&lt;p&gt;The error check uses &lt;code&gt;IS_SPECIAL_VARNO()&lt;/code&gt;. Searching for this macro in &lt;code&gt;src/include/nodes/primnodes.h&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * Var - expression node representing a variable (ie, a table column)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In the parser and planner, varno and varattno identify the semantic
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * referent, which is a base-relation column unless the reference is to a join
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * USING column that isn&amp;#39;t semantically equivalent to either join input column
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * (because it is a FULL join or the input column requires a type coercion).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * In those cases varno and varattno refer to the JOIN RTE. (Early in the
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * planner, we replace such join references by the implied expression; but up
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * till then we want join reference Vars to keep their original identity for
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * query-printing purposes.)
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INNER_VAR		65000	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to inner subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define OUTER_VAR		65001	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to outer subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INDEX_VAR		65002	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to index column */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IS_SPECIAL_VARNO(varno)		((varno) &amp;gt;= INNER_VAR)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The comment above is a bit dense, but one phrase is key: &lt;em&gt;In those cases varno and varattno refer to the JOIN RTE&lt;/em&gt;. varno is related to RTE.&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;varno&amp;gt;=65000&lt;/code&gt;, the error is thrown. (We won&amp;rsquo;t go into the differences between &lt;code&gt;INNER_VAR&lt;/code&gt;, &lt;code&gt;OUTER_VAR&lt;/code&gt;, and &lt;code&gt;INDEX_VAR&lt;/code&gt; here since their values are close and don&amp;rsquo;t affect the analysis.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What is RTE?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Descriptions of RTE (rangetable or RangeTblEntry) can be found throughout the execution plan source code, and the error is clear: &lt;code&gt;ERROR: 54000: too many range table entries&lt;/code&gt; — it&amp;rsquo;s about RTE. So what is RTE?&lt;/p&gt;
&lt;p&gt;In &lt;code&gt;src/include/nodes/parsenodes.h&lt;/code&gt;, there&amp;rsquo;s a description of RTE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*--------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * RangeTblEntry -
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 A range table is a List of RangeTblEntry nodes.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 A range table entry may represent a plain relation, a sub-select in
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 FROM, or the result of a JOIN clause. (Only explicit JOIN syntax
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 produces an RTE, not the implicit join resulting from multiple FROM
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 items. This is because we only need the RTE to deal with SQL features
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 like outer joins and join-output-column aliasing.) Other special
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 RTE types also exist, as indicated by RTEKind.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 Note that we consider RTE_RELATION to cover anything that has a pg_class
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; *	 entry. relkind distinguishes the sub-cases.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Simply put, an RTE is a &amp;ldquo;table&amp;rdquo; in the execution plan — it can be a concrete table or a generated &amp;ldquo;table&amp;rdquo; like a subquery, join result, etc. The RTE limit of 65000 means too many RTEs were generated in the execution plan.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Viewing the UPDATE Execution Plan
 &lt;div id="viewing-the-update-execution-plan" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#viewing-the-update-execution-plan" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Since we now know what RTE is, looking at the SQL execution plan may help. But since the original SQL (400 partitions) couldn&amp;rsquo;t generate an execution plan, let&amp;rsquo;s create a 30-partition table and hopefully EXPLAIN it to observe the plan.&lt;/p&gt;
&lt;p&gt;30-partition table with the same UPDATE statement:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Generated execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; QUERY PLAN 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------------------------------------------------------------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;4980&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3042&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_1
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_2
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Update&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_30
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash Semi &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;166&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3042&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzl_1.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; t.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2912&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Subquery Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230601_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_32 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230602_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_33 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230630_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_61 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash Semi &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;166&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;3042&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzl_30.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; t_29.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_30 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;2912&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Subquery Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; t_29 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230601_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_931 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230602_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_932 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230630_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_960 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;2041&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The execution plan is extremely long — 2041 rows in total. This plan is very inefficient: every time a partition is updated, the predicate conditions are run against the partitioned table all over again. Since the SQL lacks a partition key, each run scans all partitions. For a 30-partition table, each partition is scanned 30 times, totaling 900 partition scans.&lt;/p&gt;
&lt;p&gt;From the execution plan, we can see that initially 30 RTEs were allocated for UPDATE up to lzl_30. Then each hash match per partition scan also allocated 30 RTEs — for example, the hash under lzl_1 has partition scans from lzl_32 to lzl_61. Why 32 instead of 31? Because the entire partition scan is a subquery and also an RTE, named t (and t, t1-t_29), totaling 30. So the total RTEs generated in the plan are 30+30+30×30=960.&lt;/p&gt;
&lt;p&gt;Looking at the SELECT execution plan, it&amp;rsquo;s very different from UPDATE:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; STATUS ,FILE_ID ,DATE_UPDATED &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Semi &lt;span style="color:#66d9ef"&gt;Join&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;48&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;467&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;90&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;98&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Hash Cond: (lzl.id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; lzl_31.id)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;309&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;600&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_1 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_2 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Seq Scan &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_30 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;20&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;106&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Hash (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;155&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;10&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Limit&lt;/span&gt; (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; Append (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;154&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;80&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;30&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230601_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230601 lzl_32 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230602_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230602 lzl_33 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Only&lt;/span&gt; Scan &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; lzl_p20230630_pkey &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl_p20230630 lzl_61 (cost&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;14&lt;/span&gt;..&lt;span style="color:#ae81ff"&gt;5&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; width&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Index&lt;/span&gt; Cond: (id &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;96&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;No repeated (Cartesian product-style) table access — RTEs only go up to 61. This is also why SELECT succeeds on 400 partitions, because 400×400 accesses is simply too many.&lt;/p&gt;
&lt;p&gt;So regarding the original SQL where UPDATE fails and SELECT succeeds, we can conclude:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;For 400 partitions with SELECT, the execution plan has 801 RTEs, which doesn&amp;rsquo;t exceed &lt;code&gt;INNER_VAR&lt;/code&gt; (65000), so it can generate a plan and execute.&lt;/li&gt;
&lt;li&gt;For 400 partitions with UPDATE, the execution plan has 160,160,400 RTEs, far exceeding &lt;code&gt;INNER_VAR&lt;/code&gt; (65000), so the plan cannot be generated and throws the RTE overflow error.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The cause is mostly analyzed, but the significant difference between SELECT and UPDATE plans is still puzzling. Let&amp;rsquo;s compare Oracle and MySQL execution plans horizontally.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Oracle Behavior
 &lt;div id="oracle-behavior" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#oracle-behavior" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Oracle partitioned table with local index:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzl (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id number &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; partition_key number &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE (partition_key)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230601 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230602&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230602 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230603&amp;#39;&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230630 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;20230631&amp;#39;&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; PKLZL &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; lzl(id, partition_key) &lt;span style="color:#66d9ef"&gt;local&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; pklzl &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; (id, partition_key) &lt;span style="color:#66d9ef"&gt;using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; pklzl;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Execution plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; STATUS ,FILE_ID ,DATE_UPDATED &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e6b4077b9290.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; sysdate
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;and&lt;/span&gt; rownum&lt;span style="color:#f92672"&gt;&amp;lt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/35e2fc036d9f.png" alt="image.png" /&gt;&lt;/p&gt;
&lt;p&gt;In Oracle, both SELECT and UPDATE use NESTED LOOP, accessing all partitions (PARTITION RANGE ALL). So in Oracle, regardless of SELECT or UPDATE, table t is the driving table. Because of IN, results are sorted and deduplicated. So Oracle&amp;rsquo;s plan is not 30×30 accesses but depends on the result set size in the driving table — n rows means n×30 partition accesses. Since driving table t has minimal data, this plan is fine.&lt;/p&gt;

&lt;h3 class="relative group"&gt;MySQL Behavior
 &lt;div id="mysql-behavior" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mysql-behavior" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since MySQL only supports local indexes, just create the primary key directly:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; test (
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; id bigint &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; date_created &lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; ,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION &lt;span style="color:#66d9ef"&gt;BY&lt;/span&gt; RANGE (partition_key) 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230601 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20230602&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230602 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20230603&lt;/span&gt;),
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;PARTITION lzl_p20230630 &lt;span style="color:#66d9ef"&gt;VALUES&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;LESS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;THAN&lt;/span&gt; (&lt;span style="color:#ae81ff"&gt;20230631&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;primary&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; pklzl(id,partition_key);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MySQL starting from 5.7 shows which partitions are scanned in the execution plan (version 8.0 here).&lt;/p&gt;
&lt;p&gt;SELECT plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; STATUS ,FILE_ID ,DATE_UPDATED &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived3&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Start&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t.id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;End&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; const &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;UPDATE plan:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;explain&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;with&lt;/span&gt; t &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; (&lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;8723&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;limit&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt; )
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;update&lt;/span&gt; lzl &lt;span style="color:#66d9ef"&gt;set&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; STATUS &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;00&amp;#39;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; FILE_ID &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;null&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt; DATE_UPDATED &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;localtimestamp&lt;/span&gt;(&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;in&lt;/span&gt; ( &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; id &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; t);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; select_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; partitions &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; possible_keys &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;key&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; key_len &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;rows&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; filtered &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Extra &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;+&lt;/span&gt;&lt;span style="color:#75715e"&gt;----+-------------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+---------------+---------+---------+-------+------+----------+-----------------+
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt;derived3&lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ALL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Start&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;UPDATE&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t.id &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;End&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;temporary&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DERIVED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lzl_p20230601,lzl_p20230602,lzl_p20230603,lzl_p20230604,lzl_p20230605,lzl_p20230606,lzl_p20230607,lzl_p20230608,lzl_p20230609,lzl_p20230610,lzl_p20230611,lzl_p20230612,lzl_p20230613,lzl_p20230614,lzl_p20230615,lzl_p20230616,lzl_p20230617,lzl_p20230618,lzl_p20230619,lzl_p20230620,lzl_p20230621,lzl_p20230622,lzl_p20230623,lzl_p20230624,lzl_p20230625,lzl_p20230626,lzl_p20230627,lzl_p20230628,lzl_p20230629,lzl_p20230630 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;ref&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;PRIMARY&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; const &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;100&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;00&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Using&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;index&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;MySQL&amp;rsquo;s two execution plans are identical. However, the driving table selection could be better — const should be the driving table to reduce scan count.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Bug?
 &lt;div id="bug" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bug" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Bug Description
 &lt;div id="bug-description" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#bug-description" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://postgrespro.com/list/thread-id/2482006" target="_blank" rel="noreferrer"&gt;https://postgrespro.com/list/thread-id/2482006&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;This bug is easy to find via the error. It was submitted by digoal (德哥) back in 2020, followed by discussion between two source code experts. The discussion is lengthy, but to summarize: PG does not support unlimited partitions, which is understandable in the real world — too many partitions can cause rapid performance degradation. However, the community still felt the limit needed adjustment and discussed the &lt;code&gt;INNER_VAR&lt;/code&gt;, &lt;code&gt;Var.varno&lt;/code&gt; values in the source code.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Misleading Nature
 &lt;div id="misleading-nature" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#misleading-nature" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The bug title is somewhat misleading: &lt;em&gt;BUG #16302: too many range table entries - when count partition table(65538 childs)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The bug seems to say the number of partition child tables can&amp;rsquo;t exceed 65,538. The discussion also mentions &lt;em&gt;PG can handle up to 64K relations in a query&lt;/em&gt; — a query cannot have more than 64K relations.&lt;/p&gt;
&lt;p&gt;This is odd because our table has 400 partitions and still throws the error. In fact, both descriptions above are not entirely accurate. The 64K limit refers to the &amp;ldquo;tables&amp;rdquo; in the execution plan, which doesn&amp;rsquo;t exactly equal real tables. Of course, if tables or partitions exceed this count, there will be problems. But even without exceeding 64K, issues can arise, as in our case with only 400 partitions.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Fix
 &lt;div id="fix" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#fix" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The bug was submitted for version 12.2; our environment is 13.2.&lt;/p&gt;
&lt;p&gt;This bug is fixed in PG15. The source in &lt;code&gt;src/include/nodes/primnodes.h&lt;/code&gt; is different:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INNER_VAR		(-1)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to inner subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define OUTER_VAR		(-2)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to outer subplan */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define INDEX_VAR		(-3)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* reference to index column */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define ROWID_VAR		(-4)	&lt;/span&gt;&lt;span style="color:#75715e"&gt;/* row identity column during planning */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define IS_SPECIAL_VARNO(varno)		((int) (varno) &amp;lt; 0)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As discussed in the community, PG15 not only changed VAR values to negative numbers but also converted varno to 32-bit (4 billion), compared to the previous 16-bit (65,536).&lt;/p&gt;
&lt;p&gt;And in the function that previously threw the error, &lt;code&gt;add_rte_to_flat_rtable()&lt;/code&gt; in &lt;code&gt;src/backend/optimizer/plan/setrefs.c&lt;/code&gt;, the error code has been completely removed! The entire PG15 source code no longer contains &lt;code&gt;too many range table entries&lt;/code&gt;!&lt;/p&gt;

&lt;h2 class="relative group"&gt;Summary
 &lt;div id="summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;PG still has room for improvement in partitioned table optimization. PG treats child partitions as regular tables, unlike Oracle and MySQL. Oracle treats child partitions as segments distinct from tables. This causes PG to output the access method for every partition in the execution plan (when pruning doesn&amp;rsquo;t occur), making plans extremely long when there are many partitions. Oracle just writes &lt;code&gt;PARTITION RANGE ALL&lt;/code&gt;. MySQL also prints all partitions but doesn&amp;rsquo;t treat each partition&amp;rsquo;s access as a subquery, reducing plan complexity.&lt;/li&gt;
&lt;li&gt;Even when partitions haven&amp;rsquo;t reached 64K, you can still get &lt;code&gt;too many range table entries&lt;/code&gt;. This limit is actually on execution plan RTE count, not partition count (though if partition count reaches this number, RTE count will too, as mentioned — PG prints access methods for all partitions).&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;too many range table entries&lt;/code&gt; error is resolved in PG15.&lt;/li&gt;
&lt;li&gt;For versions below 15, don&amp;rsquo;t create too many partitions! You can also leverage partition pruning to reduce accessed partitions — in this case, simply adding a partition key condition to the WHERE clause would work.&lt;/li&gt;
&lt;/ul&gt;</content:encoded></item><item><title>Vector Database Core Concepts</title><link>https://lastdba.com/en/2024/08/12/vector-database-core-concepts/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/vector-database-core-concepts/</guid><description>&lt;h2 class="relative group"&gt;Vector Database Core Concepts
 &lt;div id="vector-database-core-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-database-core-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;A Bit of History
 &lt;div id="a-bit-of-history" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-bit-of-history" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The development history of LLM models, from &lt;a href="https://arxiv.org/pdf/2304.13712" target="_blank" rel="noreferrer"&gt;Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond&lt;/a&gt;&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6913a42c261b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Many people only gradually learned about large models after the ChatGPT explosion, but in the years before that tipping point, the development of large models had already begun a war of the gods. Several institutions published many revolutionary papers — on the corporate side: Google, DeepMind, OpenAI, Meta, Microsoft; on the academic side: Stanford, Berkeley, CMU, Princeton, MIT&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;Vector Database Core Concepts
 &lt;div id="vector-database-core-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-database-core-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;A Bit of History
 &lt;div id="a-bit-of-history" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#a-bit-of-history" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The development history of LLM models, from &lt;a href="https://arxiv.org/pdf/2304.13712" target="_blank" rel="noreferrer"&gt;Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond&lt;/a&gt;&lt;sup id="fnref:1"&gt;&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6913a42c261b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Many people only gradually learned about large models after the ChatGPT explosion, but in the years before that tipping point, the development of large models had already begun a war of the gods. Several institutions published many revolutionary papers — on the corporate side: Google, DeepMind, OpenAI, Meta, Microsoft; on the academic side: Stanford, Berkeley, CMU, Princeton, MIT&lt;sup id="fnref:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;There are three main camps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Google &amp;amp; DeepMind camp — Gemini, Bard&lt;/li&gt;
&lt;li&gt;Microsoft &amp;amp; OpenAI camp — ChatGPT, Bing&lt;/li&gt;
&lt;li&gt;Meta open-source community camp — Llama&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Timeline of recent large model product releases, from &lt;a href="https://arxiv.org/pdf/2303.18223.pdf" target="_blank" rel="noreferrer"&gt;A Survey of Large Language Models&lt;/a&gt;&lt;sup id="fnref:3"&gt;&lt;a href="#fn:3" class="footnote-ref" role="doc-noteref"&gt;3&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/517ff3855241.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;Generative AI Basics
 &lt;div id="generative-ai-basics" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#generative-ai-basics" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;AIGC (Artificial Intelligence Generated Content)&lt;/strong&gt;: The precise concept of AIGC is a mode of production that uses AI to automatically generate content. In a broader sense, AIGC can be approximated as AI technology trained to possess human-like generative and creative capabilities — i.e., Generative AI. It can autonomously generate and create new text, images, music, videos, 3D interactive content, and various other forms of content and data based on data and generative algorithm models, and even includes enabling new scientific discoveries and creating new meanings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;LLM (Large Language Model)&lt;/strong&gt;: LLMs are large language models capable of capturing and processing complex language patterns and semantics — that is, they can understand and generate human language. GPT-3, ChatGPT, BERT, T5, ERNIE Bot, and others are typical large language models.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NLP (Natural Language Processing)&lt;/strong&gt;: Natural Language Processing (NLP) studies how to enable computers to read and understand human language — i.e., converting natural human language into instructions that computers can process. LLM is an important component of NLP.&lt;/p&gt;
&lt;p&gt;AIGC has achieved remarkable growth, largely due to Natural Language Processing (NLP), and the biggest driver behind NLP&amp;rsquo;s progress is the Large Language Model (LLM). This year (2024), AIGC is also developing rapidly in areas such as video and audio.&lt;sup id="fnref:4"&gt;&lt;a href="#fn:4" class="footnote-ref" role="doc-noteref"&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;prompt&lt;/strong&gt;: Instructions or directives — natural language provided to AI describing a task, used to guide a language model (such as GPT-3 or GPT-4) to generate the corresponding output&lt;sup id="fnref:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt;. (Everyone basically knows what this is already, no need to elaborate.)&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;embedding&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;Embedding is a method of representing objects (such as text, images, and audio) as points in a continuous vector space, where the positions of these points in space carry semantic meaning for machine learning algorithms.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3449199f0a3f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Based on &lt;a href="https://nlp.stanford.edu/projects/glove/" target="_blank" rel="noreferrer"&gt;GloVe&lt;/a&gt; word-vector relevance for English words, there is an &lt;a href="https://blog.echen.me/embedding-explorer/#/" target="_blank" rel="noreferrer"&gt;interactive 2D embedding explorer&lt;/a&gt;. This shows natural language embedded as 2D vectors:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/97b468b62314.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;RAG
 &lt;div id="rag" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#rag" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;RAG (Retrieval-Augmented Generation) is a two-stage process consisting of document retrieval and large language model (LLM) answer generation. The initial stage leverages dense embeddings to retrieve documents. Depending on the specific use case, this retrieval can be based on various database formats, such as vector databases, summary indexes, tree indexes, and key indexes&lt;sup id="fnref1:5"&gt;&lt;a href="#fn:5" class="footnote-ref" role="doc-noteref"&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6eb500130d41.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://arxiv.org/pdf/2005.11401" target="_blank" rel="noreferrer"&gt;original RAG paper&lt;/a&gt;&lt;sup id="fnref:6"&gt;&lt;a href="#fn:6" class="footnote-ref" role="doc-noteref"&gt;6&lt;/a&gt;&lt;/sup&gt; was published on May 22, 2020, by researchers from Facebook (Meta), University College London, and New York University, proposing a general fine-tuning approach for RAG. RAG includes the following characteristics&lt;sup id="fnref1:2"&gt;&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref"&gt;2&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;RAG models combine pre-trained memory to assist language generation&lt;/li&gt;
&lt;li&gt;RAG models generate language that is more specific, diverse, and factual&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6ed7b3a3ae81.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;On March 23, 2023, OpenAI released the &lt;a href="https://github.com/openai/chatgpt-retrieval-plugin" target="_blank" rel="noreferrer"&gt;chatgpt-retrieval-plugin&lt;/a&gt; repository, recommending the use of vector databases in RAG. From that point on, vector databases gained widespread attention in the application domain, riding the wave of large model popularity.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b58b99f55a52.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;What Can Vector Databases Bring to AI?
 &lt;div id="what-can-vector-databases-bring-to-ai" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-can-vector-databases-bring-to-ai" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Vector databases can provide large models with data retrieval and long-term data storage capabilities within RAG&lt;sup id="fnref:7"&gt;&lt;a href="#fn:7" class="footnote-ref" role="doc-noteref"&gt;7&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f35b1bc4881b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Why use RAG? No words carry more weight than those of the master, OpenAI. The following passage is from the retrieval plugin usage guide released by OpenAI in March 2023&lt;sup id="fnref:8"&gt;&lt;a href="#fn:8" class="footnote-ref" role="doc-noteref"&gt;8&lt;/a&gt;&lt;/sup&gt;, translated by ChatGPT:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The open-source retrieval plugin enables ChatGPT to access personal or organizational information sources (with permission). Users can ask questions or express needs in natural language and obtain the most relevant document snippets from their data sources (such as files, notes, emails, or public documents).&lt;/p&gt;
&lt;p&gt;As an open-source and self-hosted solution, developers can deploy their own version of the plugin and register it with ChatGPT. The plugin leverages OpenAI&amp;rsquo;s embeddings and allows developers to choose a vector database (such as Milvus, Pinecone, Qdrant, Redis, Weaviate, or Zilliz) to index and search documents. Information sources can be synchronized with the database using webhooks.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In short, OpenAI recommends everyone use vector databases.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://mp.weixin.qq.com/s/0eBZ4zyX6XjBQO0GqlANnw" target="_blank" rel="noreferrer"&gt;Has the vector database cooled off?&lt;/a&gt; Not only has it not cooled off — RAG has developed to the point of being everywhere today — &lt;a href="https://mp.weixin.qq.com/s/awIInAtPOkZz_s4jg9TO_w" target="_blank" rel="noreferrer"&gt;Has RAG Technology Really Become &amp;ldquo;Commonplace&amp;rdquo;?&lt;/a&gt;. And vector databases, with their high retrieval efficiency, data storage reliability, and other characteristics, are an important part of RAG.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Common Vector Databases
 &lt;div id="common-vector-databases" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#common-vector-databases" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Since OpenAI released the RAG repo, many vector databases have emerged (though some existed before). Several companies have also secured considerable funding&lt;sup id="fnref:9"&gt;&lt;a href="#fn:9" class="footnote-ref" role="doc-noteref"&gt;9&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;Company&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Headquartered in&lt;/th&gt;
 &lt;th style="text-align: left"&gt;Funding&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Weaviate&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇳🇱 Amsterdam&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$68M Series B&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Qdrant&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇩🇪 Berlin&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$11M Seed&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Pinecone&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇺🇸 San Francisco&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$138M Series B&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Milvus/Zilliz&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇨🇳 / 🇺🇸 Redwood City&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$113M Series B&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Chroma&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇺🇸 San Francisco&lt;/td&gt;
 &lt;td style="text-align: left"&gt;$20M Seed&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;LanceDB&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇺🇸 San Francisco&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Venture&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Vespa&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇳🇴 / 🇺🇸 Indianapolis&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Yahoo!&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;Vald&lt;/td&gt;
 &lt;td style="text-align: left"&gt;🇯🇵 Tokyo&lt;/td&gt;
 &lt;td style="text-align: left"&gt;Yahoo! Japan&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Vector database release timeline:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/87c1f32c95b1.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/erikbern/ann-benchmarks" target="_blank" rel="noreferrer"&gt;Vector database performance comparison&lt;/a&gt;&lt;sup id="fnref:10"&gt;&lt;a href="#fn:10" class="footnote-ref" role="doc-noteref"&gt;10&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5d6c1d0ba8c2.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Dedicated vector databases generally perform better than traditional databases with vector plugins, for roughly two reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Dedicated vector databases are built with vector-specific underlying storage, and their performance is generally better than untargeted traditional databases.&lt;/li&gt;
&lt;li&gt;Dedicated vector databases are generally newer (mostly implemented in Go or Rust), making code-level optimization easier.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, this does not mean plugin-based vector databases have no place:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Traditional databases natively support more features, not just similarity computation.&lt;/li&gt;
&lt;li&gt;ACID — traditional database storage is safer.&lt;/li&gt;
&lt;li&gt;It&amp;rsquo;s easier to manipulate data within a single database.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Vector database feature comparison:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c1f5f45fa343.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The description of &lt;a href="https://github.com/pgvector/pgvector" target="_blank" rel="noreferrer"&gt;pgvector&lt;/a&gt; above is no longer entirely accurate — pgvector now supports HNSW, and the pgvector ecosystem project &lt;a href="https://github.com/timescale/pgvectorscale" target="_blank" rel="noreferrer"&gt;pgvectorscale&lt;/a&gt; also supports DiskANN.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Mathematical Concepts
 &lt;div id="mathematical-concepts" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#mathematical-concepts" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Mathematics says: &amp;ldquo;I stand on the mountaintop watching you all play.&amp;rdquo;&lt;/em&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Scalar&lt;/strong&gt;
 &lt;div id="scalar" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#scalar" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;A scalar is a specific number. Scalars have no direction and are generally defined in contrast to vectors.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Vector&lt;/strong&gt;
 &lt;div id="vector" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In Euclidean space, a vector has both magnitude and direction. For example, vector &lt;strong&gt;a&lt;/strong&gt; from point &lt;em&gt;A&lt;/em&gt; to point &lt;em&gt;B&lt;/em&gt; (contains information about both points and direction)&lt;sup id="fnref:11"&gt;&lt;a href="#fn:11" class="footnote-ref" role="doc-noteref"&gt;11&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fa984d43877f.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Unit Vector&lt;/strong&gt;
 &lt;div id="unit-vector" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#unit-vector" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;A vector with magnitude one is a unit vector. The unit vector equals the vector divided by its Euclidean length&lt;sup id="fnref:12"&gt;&lt;a href="#fn:12" class="footnote-ref" role="doc-noteref"&gt;12&lt;/a&gt;&lt;/sup&gt;:
$$
\vec a = \frac{\mathbf a}{||\mathbf a||}
$$&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e9563146f301.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In mathematics, the &lt;a href="https://en.wikipedia.org/wiki/Unit_vector" target="_blank" rel="noreferrer"&gt;Unit Vector&lt;/a&gt; is called a &amp;ldquo;normalized vector&amp;rdquo; in pgvector and OpenAI embeddings. (Note: do not confuse this with the mathematical concept of the &lt;a href="https://en.wikipedia.org/wiki/Normal_%28geometry%29" target="_blank" rel="noreferrer"&gt;normal vector&lt;/a&gt; — a normal vector is a different concept entirely.)&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Why use unit vectors?&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;OpenAI embeddings&amp;rsquo; explanation for using unit vectors&lt;sup id="fnref:13"&gt;&lt;a href="#fn:13" class="footnote-ref" role="doc-noteref"&gt;13&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;OpenAI embeddings are normalized to length 1, which means that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cosine similarity can be computed slightly faster using just a dot product&lt;/li&gt;
&lt;li&gt;Cosine similarity and Euclidean distance will result in the identical rankings&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;h4 class="relative group"&gt;&lt;strong&gt;Sparse Vector&lt;/strong&gt;
 &lt;div id="sparse-vector" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#sparse-vector" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Sparse vectors are called &amp;ldquo;sparse&amp;rdquo; because the information in the vector is sparsely distributed. Typically, we need to find a few ones (relevant information) among thousands of zeros. Therefore, these vectors can contain many dimensions, usually in the tens of thousands.&lt;/p&gt;
&lt;p&gt;Comparison of sparse and dense vectors: Sparse vectors contain sparsely distributed bits of information, while dense vectors carry more information in every dimension — information-dense.&lt;sup id="fnref:14"&gt;&lt;a href="#fn:14" class="footnote-ref" role="doc-noteref"&gt;14&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6d7500917874.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Euclidean Space&lt;/strong&gt;
 &lt;div id="euclidean-space" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#euclidean-space" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Simply called Euclidean space, it is the most fundamental space in mathematics. In modern mathematics, a space of positive integer n dimensions is called Euclidean space.&lt;/p&gt;
&lt;p&gt;There are other space definitions, such as inner product space and Hilbert space. They differ in mathematical definitions, but in database/real-world contexts, the distinctions are not so fine-grained. The key takeaway is that inner product space, Euclidean space, and Hilbert space can all contain elements such as points, vectors, and inner products — we can simply call them &amp;ldquo;&lt;strong&gt;multi-dimensional spaces&lt;/strong&gt;&amp;rdquo;. For their differences, see &lt;a href="https://zhuanlan.zhihu.com/p/684643954" target="_blank" rel="noreferrer"&gt;A Casual Discussion of Various Spaces in Mathematics&lt;/a&gt;&lt;sup id="fnref:15"&gt;&lt;a href="#fn:15" class="footnote-ref" role="doc-noteref"&gt;15&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/a22c1c460ef1.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Euclidean Distance&lt;/strong&gt;
 &lt;div id="euclidean-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#euclidean-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Simply called Euclidean distance, this is what we generally think of as the distance between points — i.e., the length of a line segment&lt;sup id="fnref:16"&gt;&lt;a href="#fn:16" class="footnote-ref" role="doc-noteref"&gt;16&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/15196bf76dd3.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In 2D space, the Euclidean distance between points q and p is:
$$
d(\mathbf p,\mathbf q)=\sqrt{(p_1-q_1)^2+(p_2-q_2)^2}
$$&lt;/p&gt;
&lt;p&gt;In n-dimensional space, the Euclidean distance between points q and p is:
$$
d(\mathbf p,\mathbf q)=\sqrt{(p_1-q_1)^2+(p_2-q_2)^2+\cdots+(p_n-q_n)^2}
$$&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Manhattan Distance (or Taxicab Distance)&lt;/strong&gt;
 &lt;div id="manhattan-distance-or-taxicab-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#manhattan-distance-or-taxicab-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;$$
d(\mathbf p,\mathbf q)= \sum_{i=1}^n | p_i-q_i|
$$&lt;/p&gt;
&lt;p&gt;Manhattan distance is the sum of the absolute differences of two points across each dimension&lt;sup id="fnref:17"&gt;&lt;a href="#fn:17" class="footnote-ref" role="doc-noteref"&gt;17&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9d81381e5fb5.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In the figure above, the green line is Euclidean distance; the red, yellow, and blue lines are Manhattan distances.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Minkowski Distance&lt;/strong&gt;
 &lt;div id="minkowski-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#minkowski-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;$$
d(\mathbf a,\mathbf b)= \left( \sum_{i=1}^n | a_i-b_i|^p \right)^{1/p}
$$&lt;/p&gt;
&lt;p&gt;The figure below shows the distance from the origin to a point of unit length at different values of p in Minkowski distance&lt;sup id="fnref:18"&gt;&lt;a href="#fn:18" class="footnote-ref" role="doc-noteref"&gt;18&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/020d3a11e478.png" alt="image" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When p=1, it is Manhattan distance, also written as &amp;ldquo;L1 distance&amp;rdquo;&lt;/li&gt;
&lt;li&gt;When p=2, it is Euclidean distance, also written as &amp;ldquo;L2 distance&amp;rdquo;&lt;/li&gt;
&lt;li&gt;When p=n, it is Minkowski distance, also written as &amp;ldquo;Ln distance&amp;rdquo;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Cosine Similarity&lt;/strong&gt;
 &lt;div id="cosine-similarity" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#cosine-similarity" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The cosine value of the angle between two vectors — also called cosine similarity. Cosine similarity depends only on the angle between the two vectors, not on the vectors&amp;rsquo; lengths&lt;sup id="fnref:19"&gt;&lt;a href="#fn:19" class="footnote-ref" role="doc-noteref"&gt;19&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7c215f476c3e.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The smaller the angle between two vectors, the larger the cosine similarity. Value range: [-1, 1]. cos(0)=1, cos(90)=0, cos(180)=-1.&lt;/p&gt;
&lt;p&gt;Cosine similarity between two vectors is written as:
$$
cos (\theta)
$$
Expressed in vector form:
$$
cos (\theta)=\frac{\mathbf a\cdot \mathbf b }{||\mathbf a|| , ||\mathbf b||}= \frac{ \sum_{i=1}^n \mathbf a_i \mathbf b_i}{ \sqrt {\sum_{i=1}^n \mathbf a_i ^2} \cdot \sqrt {\sum_{i=1}^n \mathbf b_i ^2}}
$$&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c561d1e0ee46.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Inner Product&lt;/strong&gt;
 &lt;div id="inner-product" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#inner-product" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Also called the dot product, it can be used to represent the length and angle of vectors. The inner product equals the &lt;em&gt;Euclidean distance&lt;/em&gt; of the vectors multiplied by the &lt;em&gt;cosine of the angle between them&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;Inner product in 2D space:
$$
\mathbf a\cdot \mathbf b=||\mathbf a|| , ||\mathbf b||, cos \theta
$$
or
$$
\mathbf a\cdot \mathbf b= a_1 b_1 + a_2 b_2
$$
Inner product in n-dimensional space (&lt;strong&gt;a&lt;/strong&gt;=[a1,a2,···,an], &lt;strong&gt;b&lt;/strong&gt;=[b1,b2,···,bn]):
$$
\mathbf a\cdot \mathbf b=\sum_{i=1}^n a_ib_i= a_1b_1 + a_2b_2 + \cdots + a_nb_n
$$&lt;/p&gt;
&lt;p&gt;Now the following diagram should make sense. Using the formulas above, you can also reverse-engineer what the distance operators mean for n-dimensional vectors.&lt;/p&gt;
&lt;p&gt;They are: Euclidean distance, cosine distance, and inner product&lt;sup id="fnref:20"&gt;&lt;a href="#fn:20" class="footnote-ref" role="doc-noteref"&gt;20&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8b206ce5a7c9.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;All three can describe the similarity between two vectors.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Euclidean distance: contains only distance information between the two vectors&lt;/li&gt;
&lt;li&gt;Cosine distance: contains only angle information between the two vectors&lt;/li&gt;
&lt;li&gt;Inner product: contains both distance information and angle information&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Of course, there are more mathematical models for vector similarity computation, but it depends on whether the vector database supports them.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Jaccard Distance&lt;/strong&gt;
 &lt;div id="jaccard-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#jaccard-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;In short: intersection divided by union&lt;sup id="fnref:21"&gt;&lt;a href="#fn:21" class="footnote-ref" role="doc-noteref"&gt;21&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e5680f0330ab.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Formula:
$$
J(A,B)= \frac{|A\cap B| }{|A \cup B|}
$$&lt;/p&gt;
&lt;p&gt;Expressed in vectors, it computes the ratio of the count of equal elements to the count of unequal elements&lt;sup id="fnref:22"&gt;&lt;a href="#fn:22" class="footnote-ref" role="doc-noteref"&gt;22&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ee60ca0304e3.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Hamming Distance&lt;/strong&gt;
 &lt;div id="hamming-distance" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hamming-distance" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The number of differing positions between two strings or vectors of equal length&lt;sup id="fnref:23"&gt;&lt;a href="#fn:23" class="footnote-ref" role="doc-noteref"&gt;23&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Examples:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&amp;ldquo;ka&lt;strong&gt;rol&lt;/strong&gt;in&amp;rdquo; and &amp;ldquo;ka&lt;strong&gt;thr&lt;/strong&gt;in&amp;rdquo; is 3.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;k&lt;strong&gt;a&lt;/strong&gt;r&lt;strong&gt;ol&lt;/strong&gt;in&amp;rdquo; and &amp;ldquo;k&lt;strong&gt;e&lt;/strong&gt;r&lt;strong&gt;st&lt;/strong&gt;in&amp;rdquo; is 3.&lt;/li&gt;
&lt;li&gt;&amp;ldquo;k&lt;strong&gt;athr&lt;/strong&gt;in&amp;rdquo; and &amp;ldquo;k&lt;strong&gt;erst&lt;/strong&gt;in&amp;rdquo; is 4.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;0000&lt;/strong&gt; and &lt;strong&gt;1111&lt;/strong&gt; is 4.&lt;/li&gt;
&lt;li&gt;2&lt;strong&gt;17&lt;/strong&gt;3&lt;strong&gt;8&lt;/strong&gt;96 and 2&lt;strong&gt;23&lt;/strong&gt;3&lt;strong&gt;7&lt;/strong&gt;96 is 3.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Illustration&lt;sup id="fnref:24"&gt;&lt;a href="#fn:24" class="footnote-ref" role="doc-noteref"&gt;24&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/2df7e926cb52.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Delaunay Triangulation&lt;/strong&gt;
 &lt;div id="delaunay-triangulation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#delaunay-triangulation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Delaunay triangulation is an operation on a set of points in a plane. It subdivides the convex hull of these points (which contains multiple points) into multiple triangles, where the circumcircle of each triangle contains no point from the set. This maximizes the minimum angle among all triangles and tends to avoid producing skinny triangles&lt;sup id="fnref:25"&gt;&lt;a href="#fn:25" class="footnote-ref" role="doc-noteref"&gt;25&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Does NOT satisfy &amp;ldquo;the circumcircle of each triangle contains no point from the set&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/61a1cd01f71f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;DOES satisfy &amp;ldquo;the circumcircle of each triangle contains no point from the set&amp;rdquo;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/dfbb3c28e6e3.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;For example, triangulating a point set:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b5d7da74a98a.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;A valid triangulation:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b13e838ade76.png" alt="img" /&gt;&lt;/p&gt;
&lt;p&gt;Delaunay triangulation is not actually an algorithm — it merely defines what a &amp;ldquo;good&amp;rdquo; triangular mesh looks like. Its excellent properties are the empty-circle property and the maximized-minimum-angle property. These two properties avoid the creation of skinny triangles and make Delaunay triangulation widely applicable.&lt;/p&gt;

&lt;h4 class="relative group"&gt;&lt;strong&gt;Voronoi Diagram&lt;/strong&gt;
 &lt;div id="voronoi-diagram" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#voronoi-diagram" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Delaunay triangulation is a triangulation of a discrete point set P in general position, and it corresponds to the dual graph of P&amp;rsquo;s Voronoi diagram. The circumcenters of Delaunay triangles are the vertices of the Voronoi diagram. In 2D, Voronoi vertices are connected by edges, which can be derived from the adjacency relationships of Delaunay triangles: if two triangles share an edge in the Delaunay triangulation, their circumcenters should be connected by an edge in the Voronoi tessellation&lt;sup id="fnref:26"&gt;&lt;a href="#fn:26" class="footnote-ref" role="doc-noteref"&gt;26&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ea403a88c609.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The key property of a Voronoi diagram is: &lt;em&gt;the distance from a centroid to any point within its region is smaller than the distance from that point to any other centroid&lt;/em&gt;.
$$
R_k={x \in X ,|,d(x,P_k) \le d(x,P_j) ; \mathrm{for ,all },j \neq k}
$$
Rk is the centroid, d(x,Pk) is the distance from the centroid to any point within its region, and d(x,Pj) is the distance from other centroids to any point in that region.&lt;/p&gt;
&lt;p&gt;Due to different ways of computing the distance d, Voronoi diagrams can take on different appearances&lt;sup id="fnref:27"&gt;&lt;a href="#fn:27" class="footnote-ref" role="doc-noteref"&gt;27&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d7f64ffda7c7.png" alt="image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;Vector Database Indexes
 &lt;div id="vector-database-indexes" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vector-database-indexes" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;

&lt;h3 class="relative group"&gt;Nearest Neighbor Search
 &lt;div id="nearest-neighbor-search" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#nearest-neighbor-search" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;ENN (Exact Nearest Neighbor)&lt;/strong&gt;: Finding the point or vector closest to a query point in a given dataset. This method guarantees the highest accuracy, but as the dataset size increases, the computational cost rises sharply because it requires evaluating the distance between the query point and every point in the dataset.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ANN (Approximate Nearest Neighbor)&lt;/strong&gt;: To improve efficiency, approximately finding the nearest point to the query point at the cost of some accuracy. This method is implemented through various algorithms and can significantly reduce computational cost, especially effective when dealing with large-scale datasets.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;KNN (K-Nearest Neighbors)&lt;/strong&gt;: A commonly used machine learning algorithm that works by finding the K nearest neighbors to a given query point in the dataset.&lt;/p&gt;

&lt;h3 class="relative group"&gt;Index Evaluation Criteria
 &lt;div id="index-evaluation-criteria" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#index-evaluation-criteria" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;Evaluating the quality of an index always depends on the specific data model, but in general, it includes the following points:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Query time&lt;/strong&gt;: Query speed is critical, especially important in large model contexts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Query quality&lt;/strong&gt;: ANN queries won&amp;rsquo;t always return perfectly accurate results, but the query quality must not deviate too much. Query quality has many metrics, the most common being recall.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory consumption&lt;/strong&gt;: The memory consumed by the query index — searching in memory is clearly faster than searching on disk.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Training time&lt;/strong&gt;: Some search methods require training to reach a good state.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Write time&lt;/strong&gt;: The impact on the index when writing vectors, including all maintenance.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most of these metrics are straightforward. Here we&amp;rsquo;ll focus on &lt;em&gt;query quality&lt;/em&gt;:&lt;/p&gt;
&lt;p&gt;In ANN search, results are not always exact. When searching a set of elements, the concepts include: the query scope (retrieved elements), all correct elements (relevant elements), the returned correct elements (true positives), and the returned incorrect elements (false positives)&lt;sup id="fnref:28"&gt;&lt;a href="#fn:28" class="footnote-ref" role="doc-noteref"&gt;28&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/7712556dface.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;TP = True positive; FP = False positive; TN = True negative; FN = False negative&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;:
$$
Accuracy=\frac{TP+TN}{TP+FP+TN+FN}
$$
or:
$$
Accuracy=\frac{\text{all correct elements}}{\text{all elements}}
$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Precision&lt;/strong&gt;:
$$
Precision=\frac{TP}{TP+FP}
$$
or:
$$
Precision=\frac{\text{retrieved correct elements}}{\text{all retrieved elements}}
$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Recall&lt;/strong&gt;:
$$
Recall=\frac{TP}{TP+FN}
$$
or:
$$
Recall=\frac{\text{retrieved correct elements}}{\text{all correct elements}}
$$&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;F-measure&lt;/strong&gt;: Equivalent to weighted precision and recall
$$
Recall=2 \cdot \frac{precision \cdot recall}{precision+recall}
$$&lt;/p&gt;
&lt;p&gt;Example: Consider a computer program designed to identify dogs (and related elements) in digital photos. When processing a photo containing ten cats and twelve dogs, the program identifies eight dogs. Among the eight identified as dogs, only five are actually dogs (true positives), while the other three are cats (false positives). Seven dogs were missed (false negatives), and seven cats were correctly excluded (true negatives). For this program:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Accuracy = 12/(10+12) (largely independent of the identification program itself)&lt;/li&gt;
&lt;li&gt;Precision = 5/8 (true positives / all retrieved elements)&lt;/li&gt;
&lt;li&gt;Recall = 5/12 (true positives / all correct elements)&lt;/li&gt;
&lt;li&gt;F-measure = 2*[(5/18)*(5/12)]/[(5/18)+(5/12)]&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Locality-Sensitive Hashing (LSH)
 &lt;div id="locality-sensitive-hashing-lsh" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#locality-sensitive-hashing-lsh" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;LSH is a method for narrowing the search scope by converting data vectors into hash values while preserving information about their similarity.&lt;/p&gt;

&lt;h4 class="relative group"&gt;LSH Construction
 &lt;div id="lsh-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsh-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;LSH has many implementations. Here we introduce the more traditional one. This traditional LSH implementation consists of three parts&lt;sup id="fnref1:22"&gt;&lt;a href="#fn:22" class="footnote-ref" role="doc-noteref"&gt;22&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Shingling&lt;/strong&gt;: Encode the original text into vectors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MinHashing&lt;/strong&gt;: Convert the vectors into a special representation called a signature, used for comparing similarity between them.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LSH function&lt;/strong&gt;: Hash the signatures into different buckets. If a pair of vectors&amp;rsquo; signatures fall into the same bucket at least once, they are considered candidates.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;Shingling
 &lt;div id="shingling" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#shingling" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Shingling is a method of embedding (in my personal opinion). Shingling identifies natural language as k consecutive tokens, with duplicate tokens removed&lt;sup id="fnref2:22"&gt;&lt;a href="#fn:22" class="footnote-ref" role="doc-noteref"&gt;22&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/45d8beeaced2.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;At this point, we have a set of tokens based on k-grams. The next step is to convert them into vectors.&lt;/p&gt;
&lt;p&gt;Start with an all-zero vector, whose length equals the length of the token set. Set the position corresponding to each token to 1:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/288f0f1f9f2c.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The final result is a very long vector containing only 0s and 1s, where the vector&amp;rsquo;s information captures the semantics of a sentence.&lt;/p&gt;

&lt;h4 class="relative group"&gt;MinHashing
 &lt;div id="minhashing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#minhashing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Since the vector dimensionality is extremely high, directly computing approximate distances using one-hot encoded vectors yields very poor results. We need to convert sparse vectors into dense vectors — this process is called MinHashing in LSH, and the converted vector is called a MinHashing signature.&lt;/p&gt;
&lt;p&gt;MinHashing can be a bit tricky for beginners at first, but once you grasp it, you&amp;rsquo;ll find it very simple.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;MinHashing is a hash function that permutes the components of an input vector and then returns the first index where the permuted vector component equals 1.&lt;/p&gt;
&lt;/blockquote&gt;&lt;ol&gt;
&lt;li&gt;First, apply a permutation: rearrange the components of a vector.&lt;/li&gt;
&lt;li&gt;Return the index of the first element that equals 1 after permutation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For example:&lt;/p&gt;
&lt;p&gt;u1 vector (0,0,1,1,0): after the first random permutation, the corresponding index is 0; after the second random permutation, the corresponding index is 0&lt;sup id="fnref:29"&gt;&lt;a href="#fn:29" class="footnote-ref" role="doc-noteref"&gt;29&lt;/a&gt;&lt;/sup&gt;. u1&amp;rsquo;s MinHashing signature is (0,0).&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c5997f710719.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;In practice, multiple minhash values can be used to approximately compute the Jaccard similarity between vectors. The more minhash values used, the more accurate the approximation.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9c95377842ce.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;LSH Function
 &lt;div id="lsh-function" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsh-function" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Even after converting sparse vectors into dense vectors, the dense vectors can still have high dimensionality, making direct retrieval inefficient.&lt;/p&gt;
&lt;p&gt;We can improve query efficiency using hash tables. However, note that using a completely random hash algorithm easily places nearby vectors into different hash buckets. We need a hash algorithm that places nearby vectors into the &lt;em&gt;same&lt;/em&gt; hash bucket — this is LSH: Locality-Sensitive Hashing.&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The LSH mechanism builds a hash table consisting of several parts which puts a pair of signatures into the same bucket if they have at least one corresponding part.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The concept of locality-sensitive hashing is also simple: split the signature into bands, compute hash values for each sub-signature band, and designate those with colliding sub-hash values as candidates.&lt;/p&gt;
&lt;p&gt;The following example is easy to understand — read through it:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f95a92163fc8.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Thinking in terms of extremes: b=1 means no banding at all — direct hashing, completely defeating the purpose of LSH. b=number of signature elements means one band per element, i.e., one hash value per element — this can achieve relatively accurate approximate comparison, but it imposes a massive burden on computation and memory.&lt;/p&gt;

&lt;h4 class="relative group"&gt;LSH Parameters and Error Rate
 &lt;div id="lsh-parameters-and-error-rate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#lsh-parameters-and-error-rate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The probability that a vector becomes a candidate vector directly affects recall. The probability of a candidate vector is as follows, where:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;s represents similarity&lt;/li&gt;
&lt;li&gt;b represents the number of bands&lt;/li&gt;
&lt;li&gt;r represents the number of rows per band&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/11d2d0ff3d63.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;If we plot P against s using the formula, the relationship between vector similarity and candidate probability is as follows:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d9a7f7f7c269.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5ce91d839ace.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The larger the number of bands b, the smaller the candidate similarity probability.&lt;/p&gt;
&lt;p&gt;At the same time, adjusting b and s affects P, and P is related to FP and TN.&lt;/p&gt;
&lt;p&gt;For example, returning more candidates naturally leads to more false positives — i.e., returning non-similar &amp;ldquo;candidate pairs.&amp;rdquo; This is an inevitable consequence of modifying the parameter b.&lt;/p&gt;
&lt;p&gt;TP = True positive; FP = False positive; TN = True negative; FN = False negative&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/888c9b18576f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;LSH is susceptible to high-dimensional data: more dimensions require longer signatures and more computation to maintain good search quality. In such cases, other indexes are recommended.&lt;/p&gt;

&lt;h4 class="relative group"&gt;More
 &lt;div id="more" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#more" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;There are two more articles I haven&amp;rsquo;t finished digesting — they seem to be related to binary vectors and Euclidean distance:&lt;/p&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/similarity-search-part-6-random-projections-with-lsh-forest-f2e9b31dcc47" target="_blank" rel="noreferrer"&gt;https://towardsdatascience.com/similarity-search-part-6-random-projections-with-lsh-forest-f2e9b31dcc47&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href="https://towardsdatascience.com/similarity-search-part-7-lsh-compositions-1b2ae8239aca" target="_blank" rel="noreferrer"&gt;https://towardsdatascience.com/similarity-search-part-7-lsh-compositions-1b2ae8239aca&lt;/a&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;HNSW Index
 &lt;div id="hnsw-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The HNSW algorithm (Hierarchical Navigable Small World) is a multi-layer graph-based proximity algorithm. HNSW is currently one of the most popular vector index algorithms.&lt;/p&gt;
&lt;p&gt;At a high level, HNSW is based on the &lt;a href="https://en.wikipedia.org/wiki/Small-world_network" target="_blank" rel="noreferrer"&gt;Small World Theory&lt;/a&gt;. The Small World Theory originally stems from the &lt;a href="https://en.wikipedia.org/wiki/Six_degrees_of_separation" target="_blank" rel="noreferrer"&gt;Six Degrees of Separation&lt;/a&gt; theory in social psychology — any two people can be connected through at most five layers of social relationships. In other words, any two people on Earth can be connected through at most six steps of social connections. The Small World Theory was later widely accepted through experimental and empirical evidence and extended to non-social relationship networks. Note that the Small World Theory is a phenomenon.&lt;/p&gt;
&lt;p&gt;In short, the Small World Theory explains that &amp;ldquo;&lt;em&gt;the connection between two entities is actually very short&lt;/em&gt;.&amp;rdquo; What HNSW does is establish connections between elements and reduce the number of connections.&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Construction
 &lt;div id="hnsw-index-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Let&amp;rsquo;s look at the HNSW paper&amp;rsquo;s algorithm for constructing HNSW graph layers&lt;sup id="fnref:30"&gt;&lt;a href="#fn:30" class="footnote-ref" role="doc-noteref"&gt;30&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/da93d451c90f.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Several elements in the construction algorithm are important:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;M&lt;/strong&gt; is the number of new edges (connections) added, representing the number of new edges for a newly inserted node.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mmax&lt;/strong&gt; is the maximum number of edges per node. If neighboring nodes are inserted continuously, the edge count of existing neighboring nodes could keep increasing, wasting computational resources during search. When inserting a new node causes an existing neighboring node&amp;rsquo;s edge count to exceed Mmax, shrink connection is needed.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;efConstruction&lt;/strong&gt; is the set of neighboring nodes.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Construction illustration&lt;sup id="fnref:31"&gt;&lt;a href="#fn:31" class="footnote-ref" role="doc-noteref"&gt;31&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/79c887052aca.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Steps for HNSW node insertion (without shrink connection)&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;When a new node is inserted, first find neighboring nodes at the top layer using &lt;em&gt;efConstruction&lt;/em&gt;. Use the found nearest neighbor as the entry point to descend to the next layer, then continue searching for neighbors using that layer&amp;rsquo;s &lt;em&gt;efConstruction&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Perform node insertion at a certain layer (e.g., L=2). Select M nodes from &lt;em&gt;efConstruction&lt;/em&gt; and connect them to the new node — at this point, 1 new node is added with M edges connected to it.&lt;/li&gt;
&lt;li&gt;Repeat step 2 until reaching the bottom layer (layer0).&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;HNSW Heuristic Neighbor Selection
 &lt;div id="hnsw-heuristic-neighbor-selection" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-heuristic-neighbor-selection" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The basic HNSW index structure construction has another problem: if two clusters are relatively far apart, according to the basic HNSW construction algorithm, the two clusters are almost impossible to connect, because the basic HNSW construction algorithm is built on the nearest neighbor nodes in &lt;em&gt;efConstruction&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://arxiv.org/pdf/1603.09320" target="_blank" rel="noreferrer"&gt;HNSW original paper&lt;/a&gt; not only proposed the basic HNSW construction algorithm but also introduced a heuristic algorithm for solving the isolated cluster problem:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/0ccf3de6e6ec.png" alt="image" /&gt;&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Fig.2 Heuristic for selecting graph neighbors for two isolated clusters. A new element is inserted on the boundary of cluster 1. All the element&amp;rsquo;s nearest neighbors belong to cluster 1, thus missing the Delaunay triangulation edges between the clusters. However, the heuristic selects element e2 from cluster 2, so if the inserted element is closer to e2 than to any other element from cluster 1, global connectivity is maintained.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;&lt;strong&gt;&amp;ldquo;The heuristic algorithm not only considers the nearest distance between nodes in the graph but also considers connectivity between different regions of the graph.&amp;rdquo;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As shown below, when adding node X, the heuristic algorithm should be applied here — establishing connectivity with cluster A, rather than simply adding to the nearest neighbor nodes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b7393a1117ad.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Search
 &lt;div id="hnsw-index-search" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-search" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The main logic of HNSW&amp;rsquo;s KNN search method as described in the &lt;a href="https://arxiv.org/pdf/1603.09320" target="_blank" rel="noreferrer"&gt;HNSW original paper&lt;/a&gt; consists of the following two algorithms:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b4bbc841673b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/b7ec9e80d0b9.png" alt="image" /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Algorithm 2 appears slightly more complex, but the logic is actually simple — Algorithm 2 finds the set of nearest neighbor nodes &lt;strong&gt;ef&lt;/strong&gt; for &lt;strong&gt;q&lt;/strong&gt; at that layer. In simple terms, Algorithm 2 adds candidate nodes to the ef set, compares distances, and removes the farthest nodes, so the returned W is the ef for q at that layer.&lt;/li&gt;
&lt;li&gt;Algorithm 5 returns the K nearest neighbor nodes of q. It calls Algorithm 2 twice (or more). The first line in the for loop has input parameter ef=1, meaning non-bottom layers only find the single nearest ep (entry point). The bottom layer (lc=0) returns the K nearest neighbor node set W.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/3154b27761ee.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Complexity
 &lt;div id="hnsw-complexity" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-complexity" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;The number of HNSW layers is a function of log(N).&lt;/p&gt;
&lt;p&gt;Search complexity: Complexity can be rigorously evaluated in a Delaunay graph, with the average complexity being O(log(N)) (for non-Delaunay graphs, such as graphs with heuristic neighbor selection, the paper does not provide a specific complexity formula).&lt;/p&gt;
&lt;p&gt;Construction complexity: HNSW is constructed by iteratively inserting all elements, with average complexity O(N∙log(N)).&lt;/p&gt;

&lt;h4 class="relative group"&gt;HNSW Index Parameters
 &lt;div id="hnsw-index-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#hnsw-index-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Generally, HNSW indexes for vector data have several adjustable parameters that affect index construction speed, recall, etc. Different databases may have slightly different parameters. Here we use pgvector&amp;rsquo;s HNSW parameters as an example:&lt;/p&gt;
&lt;p&gt;Index construction parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;m&lt;/strong&gt;: Maximum number of edges per vector, default 16. Equivalent to Mmax in the paper.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ef_construction&lt;/strong&gt;: Number of vectors in the neighbor list during index construction, default 64. Equivalent to ef_construction in the paper.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Index search parameters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;hnsw.ef_search&lt;/strong&gt;: Adjusts the number of vectors in the neighbor list during search (also equivalent to ef_construction in the paper). Must be greater than or equal to limit.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Impact of adjusting ef_construction on creation time and recall during index construction&lt;sup id="fnref1:20"&gt;&lt;a href="#fn:20" class="footnote-ref" role="doc-noteref"&gt;20&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/44a379598309.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Increasing ef_construction improves recall but extends index creation time. After ef_construction=256, index construction time increases noticeably but recall improvement is not obvious.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ebd3ed59025b.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Increasing m also improves recall and extends index creation time. After m=36, index construction time increases noticeably but recall improvement is not obvious.&lt;/p&gt;
&lt;p&gt;Similarly, increasing hnsw.ef_search improves recall at the cost of performance.&lt;/p&gt;

&lt;h3 class="relative group"&gt;IVFFlat Index
 &lt;div id="ivfflat-index" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;IVFFlat stands for Inverted File with Flat Compression. (What&amp;rsquo;s the relationship with &amp;ldquo;invert&amp;rdquo;? Do all indexes that can&amp;rsquo;t be categorized get called inverted?) The core concept of the IVFFlat index is based on the Voronoi diagram:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The key property of a Voronoi diagram is: &lt;em&gt;the distance from a centroid to any point within its region is smaller than the distance from that point to any other centroid&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;This property is expressed in formula form:
$$
R_k={x \in X ,|,d(x,P_k) \le d(x,P_j) ; \mathrm{for ,all },j \neq k}
$$
Rk is the centroid, d(x,Pk) is the distance from the centroid to any point within its region, and d(x,Pj) is the distance from other centroids to any point in that region.&lt;/p&gt;
&lt;p&gt;Using this concept, we can partition many vectors into regions by setting centroids, and then use the Voronoi diagram property to roughly find nearby points.&lt;/p&gt;

&lt;h4 class="relative group"&gt;IVFFlat Index Construction
 &lt;div id="ivfflat-index-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Let&amp;rsquo;s reduce high-dimensional space to 2D for understanding IVFFlat index construction&lt;sup id="fnref:32"&gt;&lt;a href="#fn:32" class="footnote-ref" role="doc-noteref"&gt;32&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;For example, the following set of X marks represents points (or vectors). Suppose we have three centroids:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/370e9bbd8bac.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The three centroids partition 3 Voronoi cells, and all points are assigned to their respective Voronoi cells:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/8e9e1aed5c5b.png" alt="image" /&gt;&lt;/p&gt;

&lt;h4 class="relative group"&gt;IVFFlat Index Search
 &lt;div id="ivfflat-index-search" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index-search" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Now there is a query node. Compute its distance to all centroids, find the nearest centroid, and the cell containing that centroid is the region to search next. Finally, within that region, find the neighboring nodes&lt;sup id="fnref:33"&gt;&lt;a href="#fn:33" class="footnote-ref" role="doc-noteref"&gt;33&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/bad429ed41be.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Boundary Problem&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;The above search path has a boundary problem. When the query is near a region boundary, if the true nearest node is in another region, the algorithm of &amp;ldquo;only searching for neighboring nodes within the region&amp;rdquo; will not find the true nearest neighbor.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9298bd73b504.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The boundary problem is fundamentally because:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Voronoi diagram only guarantees that the distance from a node to its own region&amp;rsquo;s centroid is smaller than the distance to other centroids, but it does NOT guarantee that the distance from a node to other nodes in its own region is smaller than the distance to nodes in other regions.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This problem can be mitigated by increasing the number of regions searched. For example, increasing the number of regions searched from 1 to 3:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9e9493428d53.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Increasing the number of search regions is generally set as a parameter in databases, such as &lt;code&gt;ivfflat.probes&lt;/code&gt; in pgvector.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;IVFFlat Search Summary&lt;/strong&gt;:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Compute the distance from the query node to all other centroids, find the nearest one.&lt;/li&gt;
&lt;li&gt;Based on the input parameter for the number of cells to query (e.g., probes), search for neighboring points in the top &lt;code&gt;probes&lt;/code&gt; cells.&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 class="relative group"&gt;IVFFlat Index Parameters
 &lt;div id="ivfflat-index-parameters" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#ivfflat-index-parameters" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Similarly, vector databases that support IVFFlat indexes generally have at least two parameters: &lt;code&gt;list&lt;/code&gt; and &lt;code&gt;probe&lt;/code&gt;. These parameters affect index search performance and recall. Here we use Faiss parameters as an example&lt;sup id="fnref1:32"&gt;&lt;a href="#fn:32" class="footnote-ref" role="doc-noteref"&gt;32&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;nlist&lt;/strong&gt;: Number of regions to construct. Increasing nlist increases the time to search for the nearest centroid but reduces the time to search for nodes within a region.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;nprobe&lt;/strong&gt;: Number of regions to search. Increasing nprobe increases the number of regions searched, which obviously reduces search performance but improves recall.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Theoretically, for nlist, it&amp;rsquo;s best to test specifically against the structure of the vector data and the database type — increasing nlist does not always reduce response time. For nprobe, increasing nprobe definitely reduces search performance and improves recall, but making nprobe too large is meaningless and goes against the original intent of ANN.&lt;/p&gt;
&lt;p&gt;The following is from Pinecone&amp;rsquo;s performance testing of the Faiss IVFFlat index:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/5ca487b356dd.png" alt="image" /&gt;&lt;/p&gt;

&lt;h3 class="relative group"&gt;PQ Product Quantization
 &lt;div id="pq-product-quantization" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-product-quantization" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;One million dense vectors may require gigabytes of memory, and real-world vectors far exceed this number. Without management, similarity vector search can require enormous amounts of memory — yet RAM is limited. Vector size increases with vector dimensionality and the number of vectors.&lt;/p&gt;
&lt;p&gt;Product Quantization (PQ) aims to reduce memory usage and can also improve query speed (because the amount of computation is reduced). PQ is a lossy compression method, which leads to reduced vector retrieval accuracy, but this is acceptable within ANN requirements.&lt;/p&gt;
&lt;p&gt;PQ&amp;rsquo;s algorithm logic is slightly more complex than other algorithms. I strongly recommend this article: &lt;a href="https://towardsdatascience.com/similarity-search-product-quantization-b2a1a6397701" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 2: Product Quantization&lt;/a&gt;&lt;sup id="fnref:34"&gt;&lt;a href="#fn:34" class="footnote-ref" role="doc-noteref"&gt;34&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PQ Construction
 &lt;div id="pq-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/56b112821338.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Step description:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Subvectors&lt;/strong&gt; — Split the original high-dimensional vector into n low-dimensional sub-vectors.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Codebook&lt;/strong&gt; — Use the k-means algorithm (or other algorithms) to compute the Voronoi diagram for &lt;em&gt;each&lt;/em&gt; set of all sub-vectors, producing n different Voronoi diagrams. These Voronoi diagrams are the codebooks (assuming each Voronoi diagram has k centroids).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clustering&lt;/strong&gt; — Place the n sub-vectors into their respective already-clustered Voronoi diagrams and compute the nearest centroid.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quantized vectors&lt;/strong&gt; — Take these n nearest centroids as the new vector — the quantized vector.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reproduction values&lt;/strong&gt; — Take the &lt;em&gt;nearest centroid index&lt;/em&gt; for each of the n subspaces as new values; the combined new values are called the PQ code.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Step 5, reproduction values, in detail:&lt;/p&gt;
&lt;p&gt;Based on the n sub-vectors and the k centroids in each subspace, we obtain an n×k centroid matrix. Taking the index of the nearest centroid for each sub-vector gives the PQ code.&lt;/p&gt;
&lt;p&gt;(btw: to be rigorous, all element indices in the diagram below should start from 1, not 0.)&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/fc2938307d7e.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The new PQ code is equivalent to a lossy-compressed new vector (reproduction value) of the original vector. New distance calculations can directly compute the L2 distance of the PQ codes.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PQ Retrieval
 &lt;div id="pq-retrieval" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-retrieval" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Based on the PQ original paper&lt;sup id="fnref:35"&gt;&lt;a href="#fn:35" class="footnote-ref" role="doc-noteref"&gt;35&lt;/a&gt;&lt;/sup&gt;, there are two PQ retrieval modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Symmetric mode&lt;/strong&gt;: The distance between vector x and vector y is approximated by the distance between their centroids q(x) and q(y). In other words, the distance between two vectors can be approximated by the distance between their PQ codes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Asymmetric mode&lt;/strong&gt;: The distance between vector x and vector y is approximated by the distance from x to the centroid q(y). In other words, the distance between two vectors can be computed using the original query vector value and the other vector&amp;rsquo;s PQ code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/d3704a6f01b5.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Clearly, the distance accuracy differs between the two modes:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/9697622aad6e.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The figure above shows the distance accuracy between two vectors under different modes, with 8 subspaces and 256 centroids. It can be seen that the asymmetric mode has higher accuracy than the symmetric mode.&lt;/p&gt;
&lt;p&gt;When comparing distances between two vectors, the symmetric and asymmetric distance computation models are quite useful. However, in the scenario of finding PQ approximate vectors, there are some differences — especially the symmetric mode, where distortion can be quite severe:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The symmetric mode&amp;rsquo;s query speed is very fast because the code table has already been computed and preserved during the PQ construction process. You only need to first compute the query vector x&amp;rsquo;s PQ code via the code table (minimal computation), then reverse-lookup the code table to get the corresponding sub-code-table — all vectors in this sub-code-table are approximate vectors at equal distance. This method requires extremely little computation — just a direct table lookup.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The symmetric mode&amp;rsquo;s distortion is relatively severe (the two figures above don&amp;rsquo;t fully capture it — imagine it as a Voronoi diagram where one cell contains multiple vectors, and you&amp;rsquo;ll realize how severe the symmetric distortion can be). The asymmetric mode can &lt;em&gt;slightly&lt;/em&gt; alleviate this problem. In asymmetric mode, first compute the PQ code of vector x, then similarly reverse-lookup the code table to get the corresponding sub-code-table, then compute distances between vector x and the vectors in this sub-code-table to obtain KNN. Its computational cost is n×km (n = number of subspaces, km ≈ total vector count / centroid count).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/25945e785366.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Asymmetric mode requires finding the centroid via the PQ code, then searching for KNN within the subspace where the centroid resides. The distance between the query vector x and an existing vector y is approximated by the distance between x and y&amp;rsquo;s centroid.&lt;/p&gt;
&lt;p&gt;PQ asymmetric retrieval&lt;sup id="fnref1:34"&gt;&lt;a href="#fn:34" class="footnote-ref" role="doc-noteref"&gt;34&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/ba66cc8da1b9.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/f08ffd4fc669.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;Steps of PQ asymmetric retrieval:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Split the query vector into multiple sub-vectors.&lt;/li&gt;
&lt;li&gt;Compute the distance between sub-vectors and the centroid matrix.&lt;/li&gt;
&lt;li&gt;Take the nearest centroid in each subspace as the query vector&amp;rsquo;s PQ code.&lt;/li&gt;
&lt;li&gt;Compute the approximate distance using the query vector and the centroid corresponding to the PQ code. Distances can be computed independently in each subspace and then summed.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As mentioned earlier, asymmetric mode&amp;rsquo;s approximate distance computation is slightly better than symmetric mode, but in some scenarios, the asymmetric distance can still deviate significantly from the actual distance:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/6213fe57a4fc.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;This is easier to understand from the figure above. Within the same cell, the distance between the farthest vector and the centroid can differ significantly from the distance between the closest vector and the centroid. Computing only the partial distance to the centroid cannot capture this difference.&lt;/p&gt;

&lt;h4 class="relative group"&gt;PQ Parameters and Their Impact
 &lt;div id="pq-parameters-and-their-impact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#pq-parameters-and-their-impact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PQ has at least two parameters that significantly affect performance and memory: the number of subspaces m and the number of centroids per subspace k.&lt;/p&gt;
&lt;p&gt;Recall:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The product quantizer is parametrized by the number of subvectors m and the number of quantizers per subvector k*, producing a code of length m × log2 k&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;With m subspaces, each having k* centroids, the length (in bits) of a PQ code is&lt;sup id="fnref1:35"&gt;&lt;a href="#fn:35" class="footnote-ref" role="doc-noteref"&gt;35&lt;/a&gt;&lt;/sup&gt;:
$$
code ; length , (bits)=m \cdot \log_2 k^*
$$&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/c89aa21160f6.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The more subspaces m, the higher the recall; the longer the PQ code, the higher the recall. Longer PQ code essentially means more centroids. Note that the specific values here are based on the paper&amp;rsquo;s dataset.&lt;/p&gt;
&lt;p&gt;Memory and complexity:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/06d80385f539.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;k represents the number of cluster centroids, D represents dimension, m represents the number of subspaces. k* represents centroids within a subspace, D* represents dimensions within a subspace.&lt;/p&gt;
&lt;p&gt;For example, with k=2048, D=128, m=8, the complexity is as follows&lt;sup id="fnref:36"&gt;&lt;a href="#fn:36" class="footnote-ref" role="doc-noteref"&gt;36&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th style="text-align: left"&gt;Operation&lt;/th&gt;
 &lt;th style="text-align: center"&gt;Memory and complexity&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;k-means&lt;/td&gt;
 &lt;td style="text-align: center"&gt;kD = 2048×128 = 262144&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td style="text-align: left"&gt;PQ&lt;/td&gt;
 &lt;td style="text-align: center"&gt;mk&lt;em&gt;D&lt;/em&gt; = (k^(1/m))×D = (2048^(1/8))×128 = 332&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;It can be seen that PQ significantly reduces complexity during search.&lt;/p&gt;

&lt;h3 class="relative group"&gt;DiskANN &amp;amp; Vamana
 &lt;div id="diskann--vamana" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#diskann--vamana" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;p&gt;The &lt;a href="https://suhasjs.github.io/files/diskann_neurips19.pdf" target="_blank" rel="noreferrer"&gt;DiskANN original paper&lt;/a&gt; Abstract&lt;sup id="fnref:37"&gt;&lt;a href="#fn:37" class="footnote-ref" role="doc-noteref"&gt;37&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Current state-of-the-art approximate nearest neighbor search (ANNS) algorithms generate indices that must be stored in main memory for fast high-recall search. This makes them expensive and limits the size of the dataset. We present a new graph-based indexing and search system called DiskANN that can index, store, and search a billion point database on a single workstation with just 64GB RAM and an inexpensive solid-state drive (SSD).&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;At the time (the paper was published in 2019), state-of-the-art ANN algorithms all relied on RAM for high recall and performance. This approach was not only expensive but also limited dataset size. DiskANN requires only 64GB RAM and an affordable SSD.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Vamana Construction
 &lt;div id="vamana-construction" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vamana-construction" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;Vamana iteratively builds a directed graph, starting from a random graph where each node represents a data point in the vector space. Initially, the graph is highly connected — all nodes are connected to each other. The graph is then optimized using an objective function that aims to maximize connectivity between the closest nodes. This is achieved by pruning most random short-range edges while adding certain long-range edges that connect distant nodes (to accelerate graph traversal)&lt;sup id="fnref1:37"&gt;&lt;a href="#fn:37" class="footnote-ref" role="doc-noteref"&gt;37&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/29f57b420d43.png" alt="image" /&gt;&lt;/p&gt;
&lt;p&gt;The figure shows 200 2D points after two iterations. The first iteration aggressively prunes edges but also removes long edges that reduce path length; when alpha is increased to relax the pruning condition, long edges are added back&lt;sup id="fnref:38"&gt;&lt;a href="#fn:38" class="footnote-ref" role="doc-noteref"&gt;38&lt;/a&gt;&lt;/sup&gt;. For the specific algorithm, refer to the paper — this is roughly the idea.&lt;/p&gt;

&lt;h4 class="relative group"&gt;The DiskANN Algorithm
 &lt;div id="the-diskann-algorithm" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-diskann-algorithm" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;From the paper&amp;rsquo;s &amp;ldquo;The DiskANN Index Design&amp;rdquo;:&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The high-level idea is simple: run Vamana on a dataset P and store the resulting graph on an SSD. At search time, whenever Algorithm 1 requires the out-neighbors of a point p, we simply fetch this information from the SSD. However, note that just storing the vector data for a billion points in 100 dimensions would far exceed the RAM on a workstation! This raises two questions: how do we build a graph over a billion points, and how do we do distance comparisons between the query point and points in our candidate list at search time in Algorithm 1, if we cannot even store the vector data?&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Run Vamana on the vector set and store it on SSD. When the dataset is very large, two problems must be addressed:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;em&gt;How to index such a large-scale dataset with limited memory resources?&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;k-means + Vamana stacking algorithm&lt;/strong&gt;: First, use k-means to partition the data into k clusters, then assign each point to the nearest i clusters. Usually, i=2 is sufficient. Build an in-memory Vamana index for each cluster, and finally merge the k Vamana indexes into one.&lt;/p&gt;
&lt;ol start="2"&gt;
&lt;li&gt;&lt;em&gt;If the original data cannot be loaded into memory, how to compute distances during search?&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Use compressed vectors (e.g., PQ) and store the compressed vectors in main memory.&lt;/p&gt;
&lt;p&gt;If index data is stored on SSD, disk access count and disk read/write requests must be minimized to ensure low search latency; at the same time, lossy compression reduces recall. Therefore, the DiskANN paper proposes three optimization strategies:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Beam Search&lt;/strong&gt;: Simply put, preload neighbor information. When searching for point p, if p&amp;rsquo;s neighbors are not in memory, they must be loaded from disk. Since the time required for a small number of random SSD accesses is roughly the same as the time for a single SSD sector access, the neighbor information of W unvisited points can be loaded in one batch. W should not be set too large or too small. Setting W too large wastes computational resources and SSD bandwidth, while setting it too small increases search latency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Caching Frequently Visited Vertices&lt;/strong&gt;: Aims to reduce disk access count. Cache all points within C hops from the starting point in memory. The value of C is best set between 3 and 4.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Implicit Re-Ranking Using Full-Precision Vectors&lt;/strong&gt;: Since PQ is lossy compression, PQ-based distance algorithms only approximate the actual distance. To eliminate this discrepancy, we store the distance from each point to all its neighbors — this is full-precision. As for the implementation principle, in simple terms, it also leverages disk loading efficiency.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Based on the paper, DiskANN&amp;rsquo;s execution efficiency and recall outperform IVF and HNSW:&lt;/p&gt;
&lt;p&gt;


&lt;img src="https://lastdba.com/img/csdn/e293f1c74241.png" alt="image" /&gt;&lt;/p&gt;

&lt;h2 class="relative group"&gt;References
 &lt;div id="references" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#references" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;Original article (Chinese): &lt;a href="https://lastdba.com/2024/08/12/%E5%90%91%E9%87%8F%E6%95%B0%E6%8D%AE%E5%BA%93%EF%BC%9A%E4%BB%8E0%E5%88%B0original-paper/" target="_blank" rel="noreferrer"&gt;向量数据库相关概念&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;&lt;div class="footnotes" role="doc-endnotes"&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id="fn:1"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2304.13712" target="_blank" rel="noreferrer"&gt;Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond&lt;/a&gt;&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:2"&gt;
&lt;p&gt;Chih-Hao Liu &lt;a href="https://tomohiroliu22.medium.com/66%E5%80%8B%E5%A4%A7%E5%9E%8B%E8%AA%9E%E8%A8%80%E6%A8%A1%E5%9E%8Bllm%E7%B6%93%E5%85%B8%E8%AB%96%E6%96%87-0fcdab74e822" target="_blank" rel="noreferrer"&gt;66 Classic LLM Papers&lt;/a&gt;&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:2" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:3"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2303.18223.pdf" target="_blank" rel="noreferrer"&gt;A Survey of Large Language Models&lt;/a&gt;&amp;#160;&lt;a href="#fnref:3" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:4"&gt;
&lt;p&gt;&lt;a href="https://juejin.cn/post/7346233811212386345" target="_blank" rel="noreferrer"&gt;一文讲清楚，AI、AGI、AIGC与AIGC、NLP、LLM，ChatGPT等概念&lt;/a&gt;&amp;#160;&lt;a href="#fnref:4" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:5"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Prompt_engineering" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Prompt_engineering&lt;/a&gt;&amp;#160;&lt;a href="#fnref:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:5" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:6"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/2005.11401" target="_blank" rel="noreferrer"&gt;RAG original paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:6" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:7"&gt;
&lt;p&gt;Jonathan Katz pgconfdev2024 &lt;a href="https://www.pgevents.ca/events/pgconfdev2024/sessions/session/1/slides/42/pgconfdev-2024-vectors.pdf" target="_blank" rel="noreferrer"&gt;Vectors: How to better support a nasty data type&lt;/a&gt;&amp;#160;&lt;a href="#fnref:7" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:8"&gt;
&lt;p&gt;OpenAI recommends using vector databases &lt;a href="https://openai.com/index/chatgpt-plugins/" target="_blank" rel="noreferrer"&gt;https://openai.com/index/chatgpt-plugins/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:8" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:9"&gt;
&lt;p&gt;&lt;a href="https://thedataquarry.com/posts/vector-db-1/" target="_blank" rel="noreferrer"&gt;Vector databases (1): What makes each one different?&lt;/a&gt;&amp;#160;&lt;a href="#fnref:9" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:10"&gt;
&lt;p&gt;&lt;a href="https://github.com/erikbern/ann-benchmarks" target="_blank" rel="noreferrer"&gt;Vector database performance comparison&lt;/a&gt;&amp;#160;&lt;a href="#fnref:10" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:11"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Vector_%28mathematics_and_physics%29" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics)&lt;/a&gt;&amp;#160;&lt;a href="#fnref:11" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:12"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Unit_vector" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Unit_vector&lt;/a&gt;&amp;#160;&lt;a href="#fnref:12" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:13"&gt;
&lt;p&gt;OpenAI on unit vector usage &lt;a href="https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions" target="_blank" rel="noreferrer"&gt;https://platform.openai.com/docs/guides/embeddings/frequently-asked-questions&lt;/a&gt;&amp;#160;&lt;a href="#fnref:13" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:14"&gt;
&lt;p&gt;Pinecone Natural Language Processing for Semantic Search &lt;a href="https://www.pinecone.io/learn/series/nlp/dense-vector-embeddings-nlp/" target="_blank" rel="noreferrer"&gt;https://www.pinecone.io/learn/series/nlp/dense-vector-embeddings-nlp/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:14" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:15"&gt;
&lt;p&gt;Yao Yuan &lt;a href="https://zhuanlan.zhihu.com/p/684643954" target="_blank" rel="noreferrer"&gt;A Casual Discussion of Various Spaces in Mathematics&lt;/a&gt;&amp;#160;&lt;a href="#fnref:15" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:16"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Euclidean_distance" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Euclidean_distance&lt;/a&gt;&amp;#160;&lt;a href="#fnref:16" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:17"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Taxicab_geometry" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Taxicab_geometry&lt;/a&gt;&amp;#160;&lt;a href="#fnref:17" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:18"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Minkowski_distance" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Minkowski_distance&lt;/a&gt;&amp;#160;&lt;a href="#fnref:18" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:19"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Sine_and_cosine" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Sine_and_cosine&lt;/a&gt;&amp;#160;&lt;a href="#fnref:19" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:20"&gt;
&lt;p&gt;Jonathan Katz pgconfeu2023 &lt;a href="https://www.postgresql.eu/events/pgconfeu2023/sessions/session/4592/slides/435/pgconfeu2023_vectors.pdf" target="_blank" rel="noreferrer"&gt;Vectors are the new JSON&lt;/a&gt;&amp;#160;&lt;a href="#fnref:20" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:20" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:21"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Jaccard_index" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Jaccard_index&lt;/a&gt;&amp;#160;&lt;a href="#fnref:21" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:22"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-part-5-locality-sensitive-hashing-lsh-76ae4b388203" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 5: Locality Sensitive Hashing (LSH)&lt;/a&gt;&amp;#160;&lt;a href="#fnref:22" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:22" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref2:22" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:23"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Hamming_distance" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Hamming_distance&lt;/a&gt;&amp;#160;&lt;a href="#fnref:23" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:24"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-part-6-random-projections-with-lsh-forest-f2e9b31dcc47" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 6: Random Projections with LSH Forest&lt;/a&gt; ↩&amp;#160;&lt;a href="#fnref:24" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:25"&gt;
&lt;p&gt;earthwjl &lt;a href="https://www.jianshu.com/p/172749e6116a" target="_blank" rel="noreferrer"&gt;Delaunay Triangulation Study Notes&lt;/a&gt;&amp;#160;&lt;a href="#fnref:25" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:26"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Delaunay_triangulation" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Delaunay_triangulation&lt;/a&gt;&amp;#160;&lt;a href="#fnref:26" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:27"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Voronoi_diagram" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Voronoi_diagram&lt;/a&gt;&amp;#160;&lt;a href="#fnref:27" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:28"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Precision_and_recall" target="_blank" rel="noreferrer"&gt;https://en.wikipedia.org/wiki/Precision_and_recall&lt;/a&gt;&amp;#160;&lt;a href="#fnref:28" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:29"&gt;
&lt;p&gt;Jianshu &lt;a href="https://www.jianshu.com/p/d4368c8f40cb" target="_blank" rel="noreferrer"&gt;LSH (Locality Sensitive Hashing) Algorithm&lt;/a&gt;&amp;#160;&lt;a href="#fnref:29" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:30"&gt;
&lt;p&gt;&lt;a href="https://arxiv.org/pdf/1603.09320" target="_blank" rel="noreferrer"&gt;HNSW Original Paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:30" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:31"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-part-4-hierarchical-navigable-small-world-hnsw-2aad4fe87d37" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 4: Hierarchical Navigable Small World (HNSW)&lt;/a&gt;&amp;#160;&lt;a href="#fnref:31" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:32"&gt;
&lt;p&gt;&lt;a href="https://www.pinecone.io/learn/series/faiss/vector-indexes/" target="_blank" rel="noreferrer"&gt;https://www.pinecone.io/learn/series/faiss/vector-indexes/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:32" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:32" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:33"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-knn-inverted-file-index-7cab80cc0e79" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 1: kNN &amp;amp; Inverted File Index&lt;/a&gt;&amp;#160;&lt;a href="#fnref:33" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:34"&gt;
&lt;p&gt;Vyacheslav Efimov &lt;a href="https://towardsdatascience.com/similarity-search-product-quantization-b2a1a6397701" target="_blank" rel="noreferrer"&gt;Similarity Search, Part 2: Product Quantization&lt;/a&gt;&amp;#160;&lt;a href="#fnref:34" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:34" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:35"&gt;
&lt;p&gt;&lt;a href="https://inria.hal.science/file/index/docid/514462/filename/paper_hal.pdf" target="_blank" rel="noreferrer"&gt;PQ Original Paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:35" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:35" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:36"&gt;
&lt;p&gt;Pinecone Faiss Manual &lt;a href="https://www.pinecone.io/learn/series/faiss/product-quantization/" target="_blank" rel="noreferrer"&gt;https://www.pinecone.io/learn/series/faiss/product-quantization/&lt;/a&gt;&amp;#160;&lt;a href="#fnref:36" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:37"&gt;
&lt;p&gt;&lt;a href="https://suhasjs.github.io/files/diskann_neurips19.pdf" target="_blank" rel="noreferrer"&gt;DiskANN Original Paper&lt;/a&gt;&amp;#160;&lt;a href="#fnref:37" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&amp;#160;&lt;a href="#fnref1:37" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id="fn:38"&gt;
&lt;p&gt;DiskANN, A Disk-based ANNS Solution with High Recall and High QPS on Billion-scale Dataset &lt;a href="https://milvus.io/blog/2021-09-24-diskann.md" target="_blank" rel="noreferrer"&gt;https://milvus.io/blog/2021-09-24-diskann.md&lt;/a&gt;&amp;#160;&lt;a href="#fnref:38" class="footnote-backref" role="doc-backlink"&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;</content:encoded></item><item><title>When Does VACUUM Truncate Empty Pages at the End of a Table?</title><link>https://lastdba.com/en/2024/08/12/when-does-vacuum-truncate-empty-pages-at-the-end-of-a-table/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/when-does-vacuum-truncate-empty-pages-at-the-end-of-a-table/</guid><description>&lt;h2 class="relative group"&gt;VACUUM Truncate
 &lt;div id="vacuum-truncate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vacuum-truncate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;TRUNCATE&amp;mdash;Specifies that &lt;code&gt;VACUUM&lt;/code&gt; should attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system. This is normally the desired behavior and is the default unless the &lt;code&gt;vacuum_truncate&lt;/code&gt; option has been set to false for the table to be vacuumed. Setting this option to false may be useful to avoid &lt;code&gt;ACCESS EXCLUSIVE&lt;/code&gt; lock on the table that the truncation requires. This option is ignored if the &lt;code&gt;FULL&lt;/code&gt; option is used.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;VACUUM Truncate
 &lt;div id="vacuum-truncate" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vacuum-truncate" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;blockquote&gt;&lt;p&gt;TRUNCATE&amp;mdash;Specifies that &lt;code&gt;VACUUM&lt;/code&gt; should attempt to truncate off any empty pages at the end of the table and allow the disk space for the truncated pages to be returned to the operating system. This is normally the desired behavior and is the default unless the &lt;code&gt;vacuum_truncate&lt;/code&gt; option has been set to false for the table to be vacuumed. Setting this option to false may be useful to avoid &lt;code&gt;ACCESS EXCLUSIVE&lt;/code&gt; lock on the table that the truncation requires. This option is ignored if the &lt;code&gt;FULL&lt;/code&gt; option is used.&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;AKA, the truncate option in VACUUM is enabled by default. It removes empty pages at the end of the table, acquiring an AccessExclusiveLock (level 8 lock) on the table during the operation.&lt;/p&gt;
&lt;p&gt;Today I found that in a certain environment, after deleting all data with &lt;code&gt;DELETE FROM&lt;/code&gt;, neither autovacuum nor manual VACUUM reclaimed the space.&lt;/p&gt;
&lt;p&gt;Reproducing the issue:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzl1(a int);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;insert&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;into&lt;/span&gt; lzl1 &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; generate_series(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;) a;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;analyze&lt;/span&gt; lzl1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb=# select relname,relpages,reltuples from pg_class where relname=&amp;#39;lzl1&amp;#39;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname | relpages | reltuples
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;---------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl1 | 5 | 1000&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;relpages is 5, so the last page number is 4.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb=&amp;gt; select t_ctid,lp,case lp_flags when 0 then &amp;#39;0:LP_UNUSED&amp;#39; when 1 then &amp;#39;LP_NORMAL&amp;#39; when 2 then &amp;#39;LP_REDIRECT&amp;#39; when 3 then &amp;#39;LP_DEAD&amp;#39; end as lp_flags,t_xmin,t_xmax,t_field3 as t_cid, raw_flags, info.combined_flags,substring(t_data,0,40) from heap_page_items(get_raw_page(&amp;#39;lzl1&amp;#39;,4)) item,LATERAL heap_tuple_infomask_flags(t_infomask, t_infomask2) info order by lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid | lp | lp_flags | t_xmin | t_xmax | t_cid | raw_flags | combined_flags | substring
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;--------+----+-----------+--------+--------+-------+-----------------------------------------+----------------+------------
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (4,1) | 1 | LP_NORMAL | 772 | 0 | 0 | {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {} | \x89030000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (4,2) | 2 | LP_NORMAL | 772 | 0 | 0 | {HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID} | {} | \x8a030000
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; ...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;delete&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; lzl1;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; t_ctid,lp,&lt;span style="color:#66d9ef"&gt;case&lt;/span&gt; lp_flags &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;0:LP_UNUSED&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_NORMAL&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;2&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_REDIRECT&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;when&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;3&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;then&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;LP_DEAD&amp;#39;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;end&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; lp_flags,t_xmin,t_xmax,t_field3 &lt;span style="color:#66d9ef"&gt;as&lt;/span&gt; t_cid, raw_flags, info.combined_flags,&lt;span style="color:#66d9ef"&gt;substring&lt;/span&gt;(t_data,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;40&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; heap_page_items(get_raw_page(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;)) item,&lt;span style="color:#66d9ef"&gt;LATERAL&lt;/span&gt; heap_tuple_infomask_flags(t_infomask, t_infomask2) info &lt;span style="color:#66d9ef"&gt;order&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; lp;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; t_ctid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp &lt;span style="color:#f92672"&gt;|&lt;/span&gt; lp_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmin &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_xmax &lt;span style="color:#f92672"&gt;|&lt;/span&gt; t_cid &lt;span style="color:#f92672"&gt;|&lt;/span&gt; raw_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; combined_flags &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;substring&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+----+-------------+--------+--------+-------+-----------+----------------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;:LP_UNUSED &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relpages,reltuples &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reltuples
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl2 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;5&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It looks like all dead tuples were reclaimed, but the space is still occupied — the pages were not freed. Why doesn&amp;rsquo;t it truncate when the table is completely empty? Let&amp;rsquo;s dig into this question.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Source Code Analysis of &lt;code&gt;should_attempt_truncation&lt;/code&gt;
 &lt;div id="source-code-analysis-of-should_attempt_truncation" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis-of-should_attempt_truncation" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;(Unless otherwise noted, the version referenced is PG 11.)&lt;/p&gt;
&lt;p&gt;In &lt;code&gt;vacuumlazy.c&lt;/code&gt; there&amp;rsquo;s a pithily named function &lt;code&gt;should_attempt_truncation&lt;/code&gt; — this is the function that decides whether truncation is needed:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;bool&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;should_attempt_truncation&lt;/span&gt;(LVRelStats &lt;span style="color:#f92672"&gt;*&lt;/span&gt;vacrelstats)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BlockNumber possibly_freeable;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	possibly_freeable &lt;span style="color:#f92672"&gt;=&lt;/span&gt; vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rel_pages &lt;span style="color:#f92672"&gt;-&lt;/span&gt; vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nonempty_pages;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (possibly_freeable &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		(possibly_freeable &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; REL_TRUNCATE_MINIMUM &lt;span style="color:#f92672"&gt;||&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		 possibly_freeable &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rel_pages &lt;span style="color:#f92672"&gt;/&lt;/span&gt; REL_TRUNCATE_FRACTION) &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		old_snapshot_threshold &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Where:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define REL_TRUNCATE_MINIMUM 1000
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define REL_TRUNCATE_FRACTION 16&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;So the conditions for truncation are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Number of empty trailing pages &amp;gt; 1000, &lt;strong&gt;or&lt;/strong&gt; number of empty trailing pages &amp;gt; 1/16 of total pages&lt;/li&gt;
&lt;li&gt;&lt;code&gt;old_snapshot_threshold &amp;lt; 0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first rule exists to avoid constantly truncating tiny bits of trailing empty pages — reclaiming that negligible space isn&amp;rsquo;t worth the time and the AccessExclusiveLock. It&amp;rsquo;s unnecessary.&lt;/p&gt;
&lt;p&gt;The second rule is explained as follows:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; Also don&lt;span style="color:#960050;background-color:#1e0010"&gt;&amp;#39;&lt;/span&gt;t attempt it &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; we are doing early pruning&lt;span style="color:#f92672"&gt;/&lt;/span&gt;vacuuming, because a
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; scan which cannot find a truncated heap page cannot determine that the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; snapshot is too old to read that page. We might be able to get away with
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; truncating all except one of the pages, setting its LSN &lt;span style="color:#a6e22e"&gt;to&lt;/span&gt; (at least) the
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; maximum of the truncated range &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; we also treated an index leaf tuple
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; pointing to a missing heap page as something to trigger the &lt;span style="color:#e6db74"&gt;&amp;#34;snapshot too&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; old&lt;span style="color:#e6db74"&gt;&amp;#34; error, but that seems fragile and seems like it deserves its own patch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; we consider it.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&amp;ldquo;Because VACUUM scanning cannot yet confirm whether page data has snapshot-too-old issues, and there are LSN and index page complications, the code logic looks fiddly. If this feature is needed, a dedicated patch would be required.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;OK, so it looks like the code simply doesn&amp;rsquo;t check whether a page actually has snapshot-too-old issues. It takes the blunt approach of checking &lt;code&gt;old_snapshot_threshold &amp;lt; 0&lt;/code&gt; — the database itself must have snapshot-too-old disabled before truncation is attempted.&lt;/p&gt;
&lt;p&gt;Going back to the earlier problem where VACUUM didn&amp;rsquo;t reclaim space: since &lt;code&gt;DELETE&lt;/code&gt; removed all data, the condition &amp;ldquo;empty trailing pages &amp;gt; 1/16 of total pages&amp;rdquo; was definitely satisfied. However, &lt;code&gt;old_snapshot_threshold&lt;/code&gt; was actually enabled in that environment:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; old_snapshot_threshold ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; old_snapshot_threshold
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;h&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Disabling &lt;code&gt;old_snapshot_threshold&lt;/code&gt; and then doing the delete-all + VACUUM will reclaim the space. Disabling &lt;code&gt;old_snapshot_threshold&lt;/code&gt; requires a database restart.&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- After restart
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;show&lt;/span&gt; old_snapshot_threshold ;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; old_snapshot_threshold
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16446&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;vacuum&lt;/span&gt; lzl1;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Pages successfully reclaimed
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=#&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; relname,relpages,reltuples &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; pg_class &lt;span style="color:#66d9ef"&gt;where&lt;/span&gt; relname&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; relname &lt;span style="color:#f92672"&gt;|&lt;/span&gt; relpages &lt;span style="color:#f92672"&gt;|&lt;/span&gt; reltuples
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;---------+----------+-----------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; lzl1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Table not rebuilt
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lzldb&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;select&lt;/span&gt; pg_relation_filepath(&lt;span style="color:#e6db74"&gt;&amp;#39;lzl1&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; pg_relation_filepath
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;----------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; base&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16384&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16446&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;All pages successfully reclaimed, table not rebuilt. Problem located.&lt;/p&gt;
&lt;p&gt;But to understand the VACUUM truncation mechanism more deeply, let&amp;rsquo;s continue to the next section.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Source Code Analysis of &lt;code&gt;lazy_truncate_heap&lt;/code&gt;
 &lt;div id="source-code-analysis-of-lazy_truncate_heap" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#source-code-analysis-of-lazy_truncate_heap" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Relying solely on &lt;code&gt;should_attempt_truncation&lt;/code&gt; to judge truncation isn&amp;rsquo;t rigorous enough. We also need to look at &lt;code&gt;lazy_truncate_heap&lt;/code&gt;, the function that actually performs truncation, which has additional checks:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; * lazy_truncate_heap - try to truncate off any empty pages at the end
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt; */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;static&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;lazy_truncate_heap&lt;/span&gt;(Relation onerel, LVRelStats &lt;span style="color:#f92672"&gt;*&lt;/span&gt;vacrelstats)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BlockNumber old_rel_pages &lt;span style="color:#f92672"&gt;=&lt;/span&gt; vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;rel_pages;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	BlockNumber new_rel_pages;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;int&lt;/span&gt;			lock_retry;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/* Report that we are now truncating */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#a6e22e"&gt;pgstat_progress_update_param&lt;/span&gt;(PROGRESS_VACUUM_PHASE,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								 PROGRESS_VACUUM_PHASE_TRUNCATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 * Loop until no more truncating can be done.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;	 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	&lt;span style="color:#66d9ef"&gt;do&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		PGRUsage	ru0;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;pg_rusage_init&lt;/span&gt;(&lt;span style="color:#f92672"&gt;&amp;amp;&lt;/span&gt;ru0);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * We need full exclusive lock on the relation in order to do
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * truncation. If we can&amp;#39;t get it, give up rather than waiting --- we
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * don&amp;#39;t want to block other backends, and we don&amp;#39;t want to deadlock
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * (which is quite possible considering we already hold a lower-grade
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * lock).
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		Vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lock_waiter_detected &lt;span style="color:#f92672"&gt;=&lt;/span&gt; false;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		lock_retry &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (true)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If we can acquire the lock, break out of while
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#a6e22e"&gt;ConditionalLockRelation&lt;/span&gt;(onerel, AccessExclusiveLock))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;break&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * Check for interrupts while trying to (re-)acquire the exclusive
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 * lock.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;			 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;CHECK_FOR_INTERRUPTS&lt;/span&gt;();
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// If lock not immediately acquired, initially (++lock_retry)=1, &amp;lt;=100;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// when &amp;gt;100, give up truncation and return
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (&lt;span style="color:#f92672"&gt;++&lt;/span&gt;lock_retry &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; (VACUUM_TRUNCATE_LOCK_TIMEOUT &lt;span style="color:#f92672"&gt;/&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL))
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * We failed to establish the lock in the specified number of
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 * retries. This means we give up truncating.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;				 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				Vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lock_waiter_detected &lt;span style="color:#f92672"&gt;=&lt;/span&gt; true;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#a6e22e"&gt;ereport&lt;/span&gt;(elevel,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;						(&lt;span style="color:#a6e22e"&gt;errmsg&lt;/span&gt;(&lt;span style="color:#e6db74"&gt;&amp;#34;&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;%s&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;\&amp;#34;&lt;/span&gt;&lt;span style="color:#e6db74"&gt;: stopping truncate due to conflicting lock request&amp;#34;&lt;/span&gt;,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;								&lt;span style="color:#a6e22e"&gt;RelationGetRelationName&lt;/span&gt;(onerel))));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;				&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;// Sleep 50ms. Looks a bit crude. Theoretical max wait: 50*100=5s
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;pg_usleep&lt;/span&gt;(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL &lt;span style="color:#f92672"&gt;*&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000L&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// After acquiring the exclusive lock, check if new tuples arrived during VACUUM.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If so, don&amp;#39;t truncate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		new_rel_pages &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;RelationGetNumberOfBlocks&lt;/span&gt;(onerel);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (new_rel_pages &lt;span style="color:#f92672"&gt;!=&lt;/span&gt; old_rel_pages)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;UnlockRelation&lt;/span&gt;(onerel, AccessExclusiveLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		new_rel_pages &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#a6e22e"&gt;count_nondeletable_pages&lt;/span&gt;(onerel, vacrelstats);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// If new tuples were written during VACUUM, don&amp;#39;t truncate
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; (new_rel_pages &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; old_rel_pages)
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		{
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#75715e"&gt;/* can&amp;#39;t do anything after all */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#a6e22e"&gt;UnlockRelation&lt;/span&gt;(onerel, AccessExclusiveLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			&lt;span style="color:#66d9ef"&gt;return&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		}
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;/*
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 * Okay to truncate.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;		 */&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;RelationTruncate&lt;/span&gt;(onerel, new_rel_pages);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#75715e"&gt;// Release lock immediately after truncation
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;		&lt;span style="color:#a6e22e"&gt;UnlockRelation&lt;/span&gt;(onerel, AccessExclusiveLock);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;	} &lt;span style="color:#66d9ef"&gt;while&lt;/span&gt; (new_rel_pages &lt;span style="color:#f92672"&gt;&amp;gt;&lt;/span&gt; vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;nonempty_pages &lt;span style="color:#f92672"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;			 vacrelstats&lt;span style="color:#f92672"&gt;-&amp;gt;&lt;/span&gt;lock_waiter_detected);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Where:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL 50 &lt;/span&gt;&lt;span style="color:#75715e"&gt;/* microseconds!! */&lt;/span&gt;&lt;span style="color:#75715e"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#define VACUUM_TRUNCATE_LOCK_TIMEOUT 5000 &lt;/span&gt;&lt;span style="color:#75715e"&gt;/* microseconds!! */&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The main function actually called is &lt;code&gt;RelationTruncate&lt;/code&gt;. The bulk of the preceding code is all about trying to acquire the AccessExclusiveLock. Beyond the two conditions mentioned earlier, truncation also won&amp;rsquo;t happen in these two cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Failed to acquire AccessExclusiveLock&lt;/li&gt;
&lt;li&gt;New data was written during the VACUUM&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 class="relative group"&gt;VACUUM Truncate May Wait Up to 5 Seconds
 &lt;div id="vacuum-truncate-may-wait-up-to-5-seconds" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vacuum-truncate-may-wait-up-to-5-seconds" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;While reading the &lt;code&gt;lazy_truncate_heap&lt;/code&gt; source code above, I noticed the lock acquisition retry loop has a somewhat crude wait:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-text" data-lang="text"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pg_usleep(VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL * 1000L);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Each loop iteration sleeps 50ms. The theoretical maximum wait is 50×100 = 5 seconds!&lt;/p&gt;
&lt;p&gt;Let&amp;rsquo;s test this wait time:&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Window 1&lt;/th&gt;
 &lt;th&gt;Window 2&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;create table lzl2;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;alter table lzl2 set (autovacuum_enabled=off);;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;insert into lzl2 select generate_series(1,1000) a;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;delete from lzl2;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;begin;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;select * from lzl2;&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;\timing&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;td&gt;vacuum lzl2; &amp;ndash; Time: 5022.122 ms (00:05.022)&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;We can see the wait time is about 5 seconds.&lt;/p&gt;
&lt;p&gt;If you&amp;rsquo;re fast enough, you can open a third window and grab a pstack of session 2:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[postgres&lt;span style="color:#960050;background-color:#1e0010"&gt;@&lt;/span&gt;cncq081298 lzl]&lt;span style="color:#960050;background-color:#1e0010"&gt;$&lt;/span&gt; pstack &lt;span style="color:#ae81ff"&gt;4113&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#0 0x00002b92a978c013 in __select_nocancel () from /lib64/libc.so.6
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#1 0x000000000086225a in pg_usleep (microsec=microsec@entry=50000) at pgsleep.c:56
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#2 0x00000000005e8212 in lazy_truncate_heap (vacrelstats=0xfc4490, onerel=0x2b92a8bc88d8) at vacuumlazy.c:1861
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#3 lazy_vacuum_rel (onerel=onerel@entry=0x2b92a8bc88d8, options=options@entry=5, params=params@entry=0x7ffc96bb31d0, bstrategy=&amp;lt;optimized out&amp;gt;) at vacuumlazy.c:290
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#4 0x00000000005e4551 in vacuum_rel (relid=32778, relation=&amp;lt;optimized out&amp;gt;, options=options@entry=5, params=params@entry=0x7ffc96bb31d0) at vacuum.c:1572
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;#5 0x00000000005e55ac in vacuum (options=5, relations=0xfc6540, params=params@entry=0x7ffc96bb31d0, bstrategy=&amp;lt;optimized out&amp;gt;, bstrategy@entry=0x0, isTopLevel=isTopLevel@entry=true) at vacuum.c:340
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It reached &lt;code&gt;pg_usleep&lt;/code&gt; inside &lt;code&gt;lazy_truncate_heap&lt;/code&gt;, passing &lt;code&gt;entry=50000 microsec&lt;/code&gt;. In reality, &lt;code&gt;pg_usleep&lt;/code&gt; looped 100 times, total wait time 50000×100 microseconds = 5 seconds.&lt;/p&gt;
&lt;p&gt;Later, in PG 15, this code was improved by replacing &lt;code&gt;pg_usleep&lt;/code&gt; with &lt;code&gt;WaitLatch&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-c" data-lang="c"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#66d9ef"&gt;void&lt;/span&gt;) &lt;span style="color:#a6e22e"&gt;WaitLatch&lt;/span&gt;(MyLatch,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WL_LATCH_SET &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WL_TIMEOUT &lt;span style="color:#f92672"&gt;|&lt;/span&gt; WL_EXIT_ON_PM_DEATH,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;VACUUM_TRUNCATE_LOCK_WAIT_INTERVAL,
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;WAIT_EVENT_VACUUM_TRUNCATE);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#a6e22e"&gt;ResetLatch&lt;/span&gt;(MyLatch);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 class="relative group"&gt;VACUUM Truncate Summary
 &lt;div id="vacuum-truncate-summary" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#vacuum-truncate-summary" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Conditions for VACUUM to trigger truncation (all must be met):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Empty trailing pages &amp;gt; 1000, &lt;strong&gt;or&lt;/strong&gt; empty trailing pages &amp;gt; 1/16 of total pages&lt;/li&gt;
&lt;li&gt;&lt;code&gt;old_snapshot_threshold &amp;lt; 0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Before PG 15 (exclusive): must acquire AccessExclusiveLock within 5 seconds&lt;/li&gt;
&lt;li&gt;No new data written during the VACUUM&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;This article was originally published in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</content:encoded></item><item><title>Why Is 'partition of' Slow When There's No Blocking?</title><link>https://lastdba.com/en/2024/08/12/why-is-partition-of-slow-when-theres-no-blocking/</link><pubDate>Mon, 12 Aug 2024 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2024/08/12/why-is-partition-of-slow-when-theres-no-blocking/</guid><description>&lt;h4 class="relative group"&gt;Analyzing Slow &lt;code&gt;CREATE TABLE.. PARTITION OF&lt;/code&gt; Statements
 &lt;div id="analyzing-slow-create-table-partition-of-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analyzing-slow-create-table-partition-of-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;063&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364668&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;connection authorized: user=user1 database=dblzl&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364669&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;statement: -- a86fae372f73414bbe1af18213a47beb
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/*a86fae372f73414bbe1af18213a47beb */
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;create table if not exists table1_partition_p2406 partition of table1 for values from (&amp;#39;2024-06-01 00:00:00&amp;#39;) to (&amp;#39;2024-07-01 00:00:00&amp;#39;); &amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;38&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;555&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;CREATE TABLE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 2129483.549 ms&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The user &amp;lsquo;user1&amp;rsquo; connected to the database at 22:02:59 and immediately executed a &lt;code&gt;create table.. partition of..&lt;/code&gt; statement, which didn&amp;rsquo;t complete until 22:38:28. The logs in between are omitted — there was a lot of session blocking information, with session 125889 as the blocking source.&lt;/p&gt;</description><content:encoded>
&lt;h4 class="relative group"&gt;Analyzing Slow &lt;code&gt;CREATE TABLE.. PARTITION OF&lt;/code&gt; Statements
 &lt;div id="analyzing-slow-create-table-partition-of-statements" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#analyzing-slow-create-table-partition-of-statements" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;063&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;2&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;authentication&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364668&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;connection authorized: user=user1 database=dblzl&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;079&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;3&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;idle&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;41364669&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;statement: -- a86fae372f73414bbe1af18213a47beb
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;/*a86fae372f73414bbe1af18213a47beb */
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#e6db74"&gt;create table if not exists table1_partition_p2406 partition of table1 for values from (&amp;#39;2024-06-01 00:00:00&amp;#39;) to (&amp;#39;2024-07-01 00:00:00&amp;#39;); &amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;38&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;28&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;555&lt;/span&gt; CST,&lt;span style="color:#e6db74"&gt;&amp;#34;user1&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;dblzl&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;30.88.79.3:37423&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;66461213&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt;ebc1,&lt;span style="color:#ae81ff"&gt;4&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;CREATE TABLE&amp;#34;&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;2024&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;05&lt;/span&gt;&lt;span style="color:#f92672"&gt;-&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;16&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;22&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;02&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;59&lt;/span&gt; CST,&lt;span style="color:#ae81ff"&gt;34&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,&lt;span style="color:#ae81ff"&gt;0&lt;/span&gt;,LOG,&lt;span style="color:#ae81ff"&gt;00000&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;duration: 2129483.549 ms&amp;#34;&lt;/span&gt;,,,,,,,,,&lt;span style="color:#e6db74"&gt;&amp;#34;&amp;#34;&lt;/span&gt;,&lt;span style="color:#e6db74"&gt;&amp;#34;client backend&amp;#34;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The user &amp;lsquo;user1&amp;rsquo; connected to the database at 22:02:59 and immediately executed a &lt;code&gt;create table.. partition of..&lt;/code&gt; statement, which didn&amp;rsquo;t complete until 22:38:28. The logs in between are omitted — there was a lot of session blocking information, with session 125889 as the blocking source.&lt;/p&gt;
&lt;p&gt;Blocked sessions looked like:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;process &lt;span style="color:#ae81ff"&gt;33569&lt;/span&gt; still waiting &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; RowExclusiveLock &lt;span style="color:#66d9ef"&gt;on&lt;/span&gt; relation &lt;span style="color:#ae81ff"&gt;53733&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;17073&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;after&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;1000&lt;/span&gt;.&lt;span style="color:#ae81ff"&gt;048&lt;/span&gt; ms&lt;span style="color:#e6db74"&gt;&amp;#34;,&amp;#34;&lt;/span&gt;Process holding the &lt;span style="color:#66d9ef"&gt;lock&lt;/span&gt;: &lt;span style="color:#ae81ff"&gt;125889&lt;/span&gt;. Wait queue: &lt;span style="color:#ae81ff"&gt;33569&lt;/span&gt;.&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;When &lt;code&gt;PARTITION OF&lt;/code&gt; adds a partition, it acquires an AccessExclusiveLock (level 8) on the parent table, which blocks all operations on the partitioned table. Normally, adding a partition via &lt;code&gt;PARTITION OF&lt;/code&gt; is very fast, and the lock is released immediately. However, if there&amp;rsquo;s a long-running transaction on the partitioned table, the level 8 lock on the parent table must wait, causing subsequent blocking.&lt;/p&gt;
&lt;p&gt;(Stolen from &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655" target="_blank" rel="noreferrer"&gt;my own diagram&lt;/a&gt;):



&lt;img src="https://lastdba.com/img/csdn/6c7f70fc3b60.png" alt="diagram" /&gt;&lt;/p&gt;
&lt;p&gt;However, in this case there was no long transaction on the table, yet &lt;code&gt;PARTITION OF&lt;/code&gt; took 35 minutes.&lt;/p&gt;
&lt;p&gt;From historical process information, this process was in D state (uninterruptible sleep), which was suspicious. Initially, I suspected memory or disk issues, but after investigation, everything was normal.&lt;/p&gt;
&lt;p&gt;However, this problem was easy to reproduce — running &lt;code&gt;create table partition of&lt;/code&gt; directly in a simulation environment was very slow. pg_stat_activity showed the statement waiting on IO:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event_type &lt;span style="color:#f92672"&gt;|&lt;/span&gt; IO
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;wait_event &lt;span style="color:#f92672"&gt;|&lt;/span&gt; DataFileRead
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;state&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; active
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;query &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; xxx partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; xx &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2025-05-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2025-06-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;strace tracing revealed the process was heavily reading one file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;pread64(&lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;, &lt;span style="color:#e6db74"&gt;&amp;#34;\22\2\0\0\220w\321&amp;gt;\0\0\5\0\24\0018\1\0 \4 \0\0\0\0\200\237\0\1\310\236p\1&amp;#34;&lt;/span&gt;..., &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;, &lt;span style="color:#ae81ff"&gt;863485952&lt;/span&gt;) &lt;span style="color:#f92672"&gt;=&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;8192&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Using file descriptor 53, we identified the file:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;[&lt;span style="color:#f92672"&gt;/&lt;/span&gt;proc&lt;span style="color:#f92672"&gt;/&lt;/span&gt;&lt;span style="color:#ae81ff"&gt;356174&lt;/span&gt;&lt;span style="color:#f92672"&gt;/&lt;/span&gt;fd] ll &lt;span style="color:#f92672"&gt;|&lt;/span&gt;grep &lt;span style="color:#ae81ff"&gt;53&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;lrwx&lt;span style="color:#75715e"&gt;------ 1 postgres postgres 64 May 17 15:34 53 -&amp;gt; /lzl/pglzl/data/base/17076/25883&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;oid2name &lt;span style="color:#f92672"&gt;-&lt;/span&gt;d lzldb &lt;span style="color:#f92672"&gt;-&lt;/span&gt;f &lt;span style="color:#ae81ff"&gt;25883&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;From&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;database&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#34;lzldb&amp;#34;&lt;/span&gt;:
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; Filenode &lt;span style="color:#66d9ef"&gt;Table&lt;/span&gt; Name
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-----------------------------------------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#ae81ff"&gt;25883&lt;/span&gt; table_partition_default&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Finally located: the table &lt;code&gt;table_partition_default&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;d&lt;span style="color:#f92672"&gt;+&lt;/span&gt; table_partition_default
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt;: table_partition_default &lt;span style="color:#66d9ef"&gt;DEFAULT&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;Partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt;: (&lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; ((date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2022-05-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;OR&lt;/span&gt; ((date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2022-05-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (da
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#960050;background-color:#1e0010"&gt;\&lt;/span&gt;dt&lt;span style="color:#f92672"&gt;+&lt;/span&gt; table_partition_default
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; List &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; relations
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;Schema&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Name &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Type&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Owner&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Persistence &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;Size&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; Description 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;--------+------------------------------------+-------+------------+-------------+-------+-------------
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; table_partition_default &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#f92672"&gt;|&lt;/span&gt; user1 &lt;span style="color:#f92672"&gt;|&lt;/span&gt; permanent &lt;span style="color:#f92672"&gt;|&lt;/span&gt; &lt;span style="color:#ae81ff"&gt;50&lt;/span&gt; GB &lt;span style="color:#f92672"&gt;|&lt;/span&gt; 
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;(&lt;span style="color:#ae81ff"&gt;1&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It was the default partition table, with tens of GB of data. Oracle DBAs might find this unfamiliar — PG&amp;rsquo;s default partition receives data that doesn&amp;rsquo;t fall into any defined partition range. The default partition ensures data is still accepted even if no matching range is defined.&lt;/p&gt;
&lt;p&gt;If data exists in the default partition and a new partition needs to cover that range, what happens? It directly throws an error:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#f92672"&gt;=&amp;gt;&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;create&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;if&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;not&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;exists&lt;/span&gt; table_partition_pxxxx partition &lt;span style="color:#66d9ef"&gt;of&lt;/span&gt; table_partition &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-12 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-01-13 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;ERROR: &lt;span style="color:#ae81ff"&gt;23514&lt;/span&gt;: updated partition &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;default&lt;/span&gt; partition &lt;span style="color:#e6db74"&gt;&amp;#34;table_partition_default&amp;#34;&lt;/span&gt; would be violated &lt;span style="color:#66d9ef"&gt;by&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;some&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;row&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;SCHEMA&lt;/span&gt; NAME: &lt;span style="color:#66d9ef"&gt;public&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; NAME: table_partition_default
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;LOCATION&lt;/span&gt;: check_default_partition_contents, partbounds.&lt;span style="color:#66d9ef"&gt;c&lt;/span&gt;:&lt;span style="color:#ae81ff"&gt;3227&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As you can see, when adding a child partition, the default partition&amp;rsquo;s partition constraint is automatically modified. The default partition constraint check is essentially validating the default partition&amp;rsquo;s data against the new partition&amp;rsquo;s range.&lt;/p&gt;
&lt;p&gt;At this point, the cause is clear:&lt;/p&gt;
&lt;p&gt;When adding a new child partition to a partitioned table, the partition creation statement needs to validate data in the default partition to ensure the new partition&amp;rsquo;s data range doesn&amp;rsquo;t conflict with existing default partition data. This caused &lt;code&gt;CREATE TABLE PARTITION OF&lt;/code&gt; to read a massive amount of default partition data, preventing the new partition from being created. The blocking then cascaded, making business data unqueryable and unwritable.&lt;/p&gt;

&lt;h4 class="relative group"&gt;Summary and Recommendations
 &lt;div id="summary-and-recommendations" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#summary-and-recommendations" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h4&gt;
&lt;p&gt;PostgreSQL partitioned tables are becoming increasingly common. Maintaining partitions requires attention to many details. I recommend reading &lt;a href="https://blog.csdn.net/qq_40687433/article/details/132525655" target="_blank" rel="noreferrer"&gt;PostgreSQL Partitioned Tables&lt;/a&gt;, which covers almost everything.&lt;/p&gt;
&lt;p&gt;In this case, the key to resolution is the data in the default partition. Before refactoring the default partition, do not use &lt;code&gt;PARTITION OF&lt;/code&gt; to create child partitions.&lt;/p&gt;
&lt;p&gt;Default partition refactoring plan:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Detach the default child partition, then properly create child partitions, and reinsert the default table data back into the partitioned table.&lt;/li&gt;
&lt;li&gt;If necessary, after detaching and creating proper child partitions, create an empty default partition to maintain business data continuity.&lt;/li&gt;
&lt;li&gt;Note that detach differs from attach — detach requires a level 8 lock on the parent table. PG14 supports &lt;code&gt;DETACH CONCURRENTLY&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If you don&amp;rsquo;t refactor the default partition, check the current data range in the default partition. Using &lt;code&gt;ATTACH&lt;/code&gt; to add child partitions will be slow, but won&amp;rsquo;t block reads and writes.&lt;/p&gt;
&lt;p&gt;Finally, a review of best practices for adding partitions:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;PARTITION OF&lt;/code&gt; requires a level 8 lock on the parent table, which carries risk. The recommended approach is to use &lt;code&gt;ATTACH&lt;/code&gt; to add new child partitions (partition indexes can be handled similarly). This does not block reads and writes, has no business impact, and can be done online.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The correct approach for adding new partitions&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1 attach partition LZLPARTITION1_202303 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If the new partition already has data, ATTACH may still be slow. You can optimize by pre-creating constraints:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The correct approach for adding a partition that already has data&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight-wrapper"&gt;&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sql" data-lang="sql"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Reduce verbose DDL by using LIKE
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;CREATE&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;TABLE&lt;/span&gt; lzlpartition1_202303
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt; (&lt;span style="color:#66d9ef"&gt;LIKE&lt;/span&gt; lzlpartition1 &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;DEFAULTS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;INCLUDING&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;CONSTRAINTS&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Skip this step if no data exists. Add a CHECK constraint referencing other partitions&amp;#39; Partition constraint to reduce ATTACH constraint validation time.
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;add&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303 &lt;span style="color:#66d9ef"&gt;CHECK&lt;/span&gt; ((date_created &lt;span style="color:#66d9ef"&gt;IS&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NOT&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;NULL&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;gt;=&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;AND&lt;/span&gt; (date_created &lt;span style="color:#f92672"&gt;&amp;lt;&lt;/span&gt; &lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;::&lt;span style="color:#66d9ef"&gt;timestamp&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;without&lt;/span&gt; time &lt;span style="color:#66d9ef"&gt;zone&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Add partition via ATTACH
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; LZLPARTITION1 attach partition LZLPARTITION1_202303 &lt;span style="color:#66d9ef"&gt;for&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;values&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;from&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-03-01 00:00:00&amp;#39;&lt;/span&gt;) &lt;span style="color:#66d9ef"&gt;to&lt;/span&gt; (&lt;span style="color:#e6db74"&gt;&amp;#39;2023-04-01 00:00:00&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#75715e"&gt;-- Optional. Before transactions occur on the new partition, drop the extra CHECK constraint
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#66d9ef"&gt;alter&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;table&lt;/span&gt; lzlpartition1_202303 &lt;span style="color:#66d9ef"&gt;drop&lt;/span&gt; &lt;span style="color:#66d9ef"&gt;constraint&lt;/span&gt; chk_202303;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content:encoded></item><item><title>Chatting About American TV Shows — June 2023</title><link>https://lastdba.com/en/2023/06/01/chatting-about-american-tv-shows-june-2023/</link><pubDate>Thu, 01 Jun 2023 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/2023/06/01/chatting-about-american-tv-shows-june-2023/</guid><description>&lt;p&gt;​
I just finished watching the &lt;em&gt;Yellowstone&lt;/em&gt; series and decided to write a bit about the American shows I&amp;rsquo;ve watched recently — eleven in total. Here&amp;rsquo;s a quick review of each.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Yellowstone
 &lt;div id="yellowstone" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#yellowstone" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Yellowstone&lt;/em&gt; is already at its fifth season, and it looks like they&amp;rsquo;ll keep going. When I first started watching, I genuinely got hooked — a beautiful, grand series with stunning cinematography and gorgeous scenery. Plus, you get to see how real American ranchers herd cattle — actual ranchers really do have that old-money landowner vibe&amp;hellip;&lt;/p&gt;</description><content:encoded>&lt;p&gt;​
I just finished watching the &lt;em&gt;Yellowstone&lt;/em&gt; series and decided to write a bit about the American shows I&amp;rsquo;ve watched recently — eleven in total. Here&amp;rsquo;s a quick review of each.&lt;/p&gt;

&lt;h2 class="relative group"&gt;Yellowstone
 &lt;div id="yellowstone" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#yellowstone" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;Yellowstone&lt;/em&gt; is already at its fifth season, and it looks like they&amp;rsquo;ll keep going. When I first started watching, I genuinely got hooked — a beautiful, grand series with stunning cinematography and gorgeous scenery. Plus, you get to see how real American ranchers herd cattle — actual ranchers really do have that old-money landowner vibe&amp;hellip;&lt;/p&gt;
&lt;p&gt;Season one&amp;rsquo;s plot holds up fine — the dynamics between the Dutton family, the Native Americans, the state government, and the developers work well, and you can casually enjoy watching cowboys herd cattle along the way. But the plot in later seasons&amp;hellip; is unexpectedly bad. Downright incomprehensible. It lowers the bar for screenwriting.&lt;/p&gt;
&lt;p&gt;Zooming into the show&amp;rsquo;s core: why do so many people love this series? Because &lt;em&gt;Yellowstone&lt;/em&gt; doesn&amp;rsquo;t just depict authentic cowboy life (they even filmed some genuine ranch cowboy life later on) — it also reflects the harsh reality that old ranches can barely survive under modern societal development. And cowboy culture and private land are the very heart of American identity. It&amp;rsquo;s not just the Dutton family stubbornly trying to preserve the ranching way of life — it almost feels like a clash between urban American development and native cultural preservation.&lt;/p&gt;
&lt;p&gt;I can responsibly say: the plot definitely gets worse with each season — so bad that the main storyline becomes unwatchable. But if they release more seasons, this show will still be my top priority over everything else.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;1923
 &lt;div id="1923" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1923" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A &lt;em&gt;Yellowstone&lt;/em&gt; prequel series. Maybe because &lt;em&gt;Yellowstone&lt;/em&gt; is so famous, this prequel &lt;em&gt;1923&lt;/em&gt; ended up with a bit too many stars — it left a bad impression from the start. Perhaps the creators thought &lt;em&gt;Yellowstone&lt;/em&gt;&amp;rsquo;s plot wasn&amp;rsquo;t good enough, and that a show purely about cowboys would be hard to craft a compelling script for, so they added two subplots to &lt;em&gt;1923&lt;/em&gt;. But adding subplots created another problem: the show doesn&amp;rsquo;t feel enough like &lt;em&gt;Yellowstone&lt;/em&gt;. Constant cutting between storylines — no &amp;ldquo;slow-paced&amp;rdquo; &lt;em&gt;Yellowstone&lt;/em&gt; vibe.&lt;/p&gt;
&lt;p&gt;The Native American girl&amp;rsquo;s storyline seems completely disconnected from the main plot — no idea when it&amp;rsquo;ll tie in. But this Native girl subplot is actually pretty good. Native lands were stolen, and their children were sent to boarding schools to be forcibly indoctrinated with white Christian beliefs. This subplot genuinely carries the &lt;em&gt;Yellowstone&lt;/em&gt; spirit. The Native characters are cold-blooded killers too — none of that &amp;ldquo;bullet in the body but still politicking&amp;rdquo; dissonance. The narrative flows smoothly without dragging; this subplot is quite watchable.&lt;/p&gt;
&lt;p&gt;As for the Africa subplot&amp;hellip; while they do capture some scenery, it&amp;rsquo;s just not as good as the Dutton ranch — doesn&amp;rsquo;t have that feeling. And once they leave Africa, it starts dragging, heavily focusing on a grand romance set against the era&amp;rsquo;s backdrop — but what does that have to do with &lt;em&gt;Yellowstone&lt;/em&gt;? And this storyline waited an entire season without converging into the main plot&amp;hellip; An entire season of setup for one character, framed as &amp;ldquo;the Dutton ranch&amp;rsquo;s hope rests on him&amp;rdquo; — the stakes are too high, and the subplot itself isn&amp;rsquo;t that compelling. Season two is highly likely to be a massive flop.&lt;/p&gt;
&lt;p&gt;The early part of &lt;em&gt;1923&lt;/em&gt; still had some ranch-versus-the-tide-of-history flavor. Later it&amp;rsquo;s pure padding — they don&amp;rsquo;t even film cattle herding anymore. Completely devoid of interest. Can&amp;rsquo;t even muster a decent fight. Kind of bad. Only eight episodes in the whole season, and the plot starts falling apart halfway through — didn&amp;rsquo;t learn anything from &lt;em&gt;Yellowstone&lt;/em&gt; except how to botch the ending.&lt;/p&gt;
&lt;p&gt;You can tell this show wanted to inherit &lt;em&gt;Yellowstone&lt;/em&gt; but also try something new — depicting that era&amp;rsquo;s America and Europe (even colonial Africa) — but ended up being a mess of everything and nothing. If you want to revisit that era, I recommend &lt;em&gt;Boardwalk Empire&lt;/em&gt;, which is set around the same time (Prohibition era) and has far more period atmosphere than this show.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;1883
 &lt;div id="1883" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#1883" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;1883&lt;/em&gt; — a grand, tragic Western epic. A &lt;em&gt;Yellowstone&lt;/em&gt; series entry, the prequel to the prequel. It feels like watching an epic saga, leaving you wanting more. It&amp;rsquo;s no longer just a simple TV show — the cinematography even has literary and artistic qualities, while also carrying a slice of American pioneering history. The U.S. had just emerged from the Civil War, everything was waiting to be rebuilt&amp;hellip;&lt;/p&gt;
&lt;p&gt;I personally really enjoy shows like &lt;em&gt;Yellowstone&lt;/em&gt; — the filming style suits my taste. But the main series plot is aggressively terrible; I&amp;rsquo;d rather just watch them ride horses on the ranch and skip the main storyline entirely. &lt;em&gt;1883&lt;/em&gt; fills that gap perfectly — not too much complex plot (but not too little either), just right. Look at the valley, look at the horses, add some epic BGM, and the immersion is strong.&lt;/p&gt;
&lt;p&gt;The entire &lt;em&gt;1883&lt;/em&gt; series doesn&amp;rsquo;t actually have much plot, but it tells a very complete story. America had just ended its Civil War, in an era of lawlessness — cowboys, bandits, sheriffs, European immigrants, Native Americans&amp;hellip; There&amp;rsquo;s some classic cowboy shootout action, but the focus is more on cowboy life and immigrants&amp;rsquo; yearning for freedom. Yet the road to freedom is full of hardship: horse thieves, Native tribes, rattlesnakes, tornadoes, and this unforgiving land. A deeply profound show. Other than the female lead&amp;rsquo;s runny nose being a minus, there&amp;rsquo;s nothing to criticize. The plot is that rare combination of complete and perfectly proportioned. Very, very highly recommended.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s a favorite line describing cowboys:&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;Tulsa King
 &lt;div id="tulsa-king" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#tulsa-king" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A pure thrill ride. Starring 70-something Sylvester Stallone as an old-school mobster who&amp;rsquo;s been locked up for decades, now reasserting order over a small city&amp;rsquo;s underworld. &amp;ldquo;It&amp;rsquo;s not that I can&amp;rsquo;t adapt — it&amp;rsquo;s that people today have messed-up rules.&amp;rdquo; Us old-school gangsters follow a code~ The plot has no real flaws, no dragging — just pure entertainment. Not sure if they&amp;rsquo;ll keep making more.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;Wednesday
 &lt;div id="wednesday" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wednesday" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A fun watch, pretty decent. I&amp;rsquo;d never seen a gothic Lolita-style American show before, and it looks pretty good. The early parts are quite engaging and fresh. Later, when it leans into mystery, it falls off — everyone can tell who&amp;rsquo;s behind it, except Wednesday (the main character)&amp;hellip; (A lot of American mystery shows are like this — start strong, then gradually fall apart.) If you&amp;rsquo;ve never tried the gothic Lolita style, give it a shot.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Last of Us
 &lt;div id="the-last-of-us" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-last-of-us" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Adapted from the video game of the same name — which I somehow never played! Precisely because I hadn&amp;rsquo;t played it, I could watch the show with a calm mind. Starring the hugely popular Lyanna Mormont (Bella Ramsey) and Oberyn Martell (Pedro Pascal) from &lt;em&gt;Game of Thrones&lt;/em&gt; — both deliver smooth, natural performances. It&amp;rsquo;s a post-apocalyptic zombie-type show, but the zombies aren&amp;rsquo;t from a virus — they&amp;rsquo;re from a fungal infection. The zombies&amp;rsquo; brains are full of fungus. One memorable scene: Bella&amp;rsquo;s character cuts open the head of a zombie stuck between rocks, and the fungus inside spills out — still alive. Maybe because of the fungus element, it&amp;rsquo;s more satisfying than the average zombie show. The visuals are great — not dark and murky, and not overly disgusting. A complete, well-told story with excellent cinematography. There&amp;rsquo;s one segment near the end that personally left me with some psychological discomfort, but overall the plot absolutely holds up. Several smaller storylines are beautifully told. Very good overall, highly recommended.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;Boardwalk Empire
 &lt;div id="boardwalk-empire" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#boardwalk-empire" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;A series spanning four seasons, now complete. Set in 1920s America, right after Prohibition was enacted. Women stood outside bars calling for their rights; politicians publicly supported women while privately running bootlegging operations; gangsters stepped out of cars in trench coats, Thompson submachine guns blazing&amp;hellip; Boardwalk Empire is about the gangster empire of Atlantic City (just below New York), built on bootlegging into a wealth rivaling nations. I imagine many have seen &lt;em&gt;Once Upon a Time in America&lt;/em&gt; — you can roughly think of this show as its TV series counterpart. This one is hard to summarize — let&amp;rsquo;s go season by season.&lt;/p&gt;
&lt;p&gt;Season one is god-tier. Plenty of risqué scenes, and the plot isn&amp;rsquo;t just smooth — it&amp;rsquo;s miraculous. Women, black communities, bootlegging, jazz, gang wars, WWI veterans&amp;hellip; Gangsters have essentially seized control of the city — even the newspapers don&amp;rsquo;t care what the mayor says.&lt;/p&gt;
&lt;p&gt;Season two is a direct continuation of season one — also excellent.&lt;/p&gt;
&lt;p&gt;Season three introduces problems. It doesn&amp;rsquo;t feel like a continuation of the first two seasons (though some plot threads connect) — it could almost stand alone. Is the plot bad? Yes, it&amp;rsquo;s disconnected. But is it terrible? Taken on its own, it&amp;rsquo;s not flawed — it&amp;rsquo;s even somewhat entertaining. This season has many brilliant segments: Half-Face taking on ten men alone, the jaw-dropping plotline of the formidable madam, extended solo blues performances by black characters — all superb!&lt;/p&gt;
&lt;p&gt;Season four is full of issues. I thought my favorite character, dormant for three seasons, would finally take center stage and do something meaningful — instead, he was hastily written off. Dear writers, if that&amp;rsquo;s how it was going to be, could you not have put him on the poster? Made it seem like something big was coming — got my hopes up for nothing&amp;hellip; Season four&amp;rsquo;s protagonist has risen too high, making it hard to drive the plot (you could already feel this in season three). The only highlight of season four is the protagonist&amp;rsquo;s childhood flashbacks — a perfect closure to his arc.&lt;/p&gt;
&lt;p&gt;Many characters&amp;rsquo; later arcs are unsatisfying, but many characters&amp;rsquo; mid-series arcs are just too brilliant&amp;hellip; Although this show isn&amp;rsquo;t hugely popular, it did win awards, and you can see many scenes being referenced by later, higher-profile American shows. For example, Gus Fring&amp;rsquo;s arc in &lt;em&gt;Breaking Bad&lt;/em&gt; borrows from Half-Face; King Tommen&amp;rsquo;s suicide in &lt;em&gt;Game of Thrones&lt;/em&gt; &amp;ldquo;completely&amp;rdquo; borrows from the butler&amp;rsquo;s suicide&amp;hellip;&lt;/p&gt;
&lt;p&gt;I really love this show — it immerses you in the glamorous cities of that era, the decadent urban life, the jazz of underground speakeasies, the gangsters&amp;hellip; A narrative that holds nothing back (I mean that about everything). The series as a whole is excellent, rich with period atmosphere.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;Band of Brothers
 &lt;div id="band-of-brothers" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#band-of-brothers" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m sure many have heard of this show&amp;rsquo;s reputation. Yes — I somehow hadn&amp;rsquo;t seen it. My elementary-school-level writing ability and limited education prevent me from offering any meaningful critique. Only one word can describe it: divine. I&amp;rsquo;ll find a chance to watch it again~&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Pacific
 &lt;div id="the-pacific" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-pacific" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;The Pacific&lt;/em&gt; was made shortly before &lt;em&gt;Band of Brothers&lt;/em&gt;. It&amp;rsquo;s actually a very good show, but then that monster came along, and this one&amp;rsquo;s reputation never reached the same heights. &lt;em&gt;Band of Brothers&lt;/em&gt; covers the European theater of WWII; this show covers the Pacific theater. Strangely, the two shows mirror their respective theaters — the European theater is far better known, and the shows follow suit&amp;hellip; Even within the show, at the same dinner table, a European theater soldier shows off a captured Nazi banner while the Pacific theater soldier has nothing to show — a touch of melancholy.&lt;/p&gt;
&lt;p&gt;Though not as famous, this is a very, very highly recommended show.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;The Mandalorian
 &lt;div id="the-mandalorian" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-mandalorian" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;&lt;em&gt;The Mandalorian&lt;/em&gt; is already at its third season. Also starring Pedro Pascal, also a dad-with-kid storyline&amp;hellip; The first two seasons were quite good and fairly popular. This third season? Not so much. The Mandalorian should probably remain a ronin-like figure driving the plot forward — a whole group of Mandalorians building a homeland just doesn&amp;rsquo;t feel right. The protagonist&amp;rsquo;s identity even gets a bit diluted. (Run, man — take the kid and adventure across the galaxy — isn&amp;rsquo;t that better?)&lt;/p&gt;
&lt;p&gt;My appreciation for this show is premised on liking the Star Wars universe. In China, Star Wars fans are genuinely rare. If you&amp;rsquo;re not into it, you probably won&amp;rsquo;t get through it — feel free to skip.&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;The White Tower (Shiroi Kyoto)
 &lt;div id="the-white-tower-shiroi-kyoto" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-white-tower-shiroi-kyoto" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;This one is a Japanese drama. I want to end with it, because it truly is exceptional — near perfect. Though it&amp;rsquo;s somewhat old, it never feels boring while watching. Many ideas are surprisingly forward-thinking, the plot rises and falls dramatically, good and evil are never absolute, and several female characters are beautifully drawn. You&amp;rsquo;ll see some classic love triangles and plot twists, and revisiting them is still quite rewarding. Professor Zaizen&amp;rsquo;s final act brings the entire series to a perfect close. Japanese drama — number one!&lt;/p&gt;
&lt;p&gt;Personal rating: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;
&lt;p&gt;Recommended: ⭐️⭐️⭐️⭐️⭐️&lt;/p&gt;

&lt;h2 class="relative group"&gt;Closing
 &lt;div id="closing" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#closing" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;All of these are worth watching, and many are masterpieces. Some shows I couldn&amp;rsquo;t find subtitled versions for, so I watched them raw — like the &lt;em&gt;Yellowstone&lt;/em&gt; prequel &lt;em&gt;1883&lt;/em&gt;. Since the dialogue wasn&amp;rsquo;t overly complex, I managed to get through it (the narration is quite sophisticated)&amp;hellip; Marking my first raw viewing.&lt;/p&gt;
&lt;p&gt;These are basically all the shows I&amp;rsquo;ve watched in the last half year or so, so I&amp;rsquo;m bundling them together. There are many other brilliant shows from earlier that left a deep impression — I&amp;rsquo;ll save that for another time when I&amp;rsquo;m in the mood~&lt;/p&gt;
&lt;p&gt;Hoping to find more good shows in the second half of the year.&lt;/p&gt;
&lt;p&gt;​&lt;/p&gt;</content:encoded></item><item><title>About</title><link>https://lastdba.com/en/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://lastdba.com/en/about/</guid><description>&lt;h2 class="relative group"&gt;The Last DBA
 &lt;div id="the-last-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-last-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Hi, I&amp;rsquo;m Zhilong Liu — a PostgreSQL DBA based in China.&lt;/p&gt;
&lt;p&gt;This blog is where I document my deep dives into PostgreSQL internals, production incident analysis, source code walkthroughs, and paper reviews. I write primarily in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;, and I&amp;rsquo;m building this English section to share key insights with the global PostgreSQL community.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What I Write About
 &lt;div id="what-i-write-about" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-i-write-about" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Case Studies&lt;/strong&gt; — Real production incidents and how they were resolved&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Internals&lt;/strong&gt; — PostgreSQL mechanisms explained from first principles&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Source Code&lt;/strong&gt; — Deep dives into specific subsystems (vacuum, locking, WAL, planner)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paper Reviews&lt;/strong&gt; — Academic papers on databases, interpreted for practitioners&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI &amp;amp; Databases&lt;/strong&gt; — AIOps, MCP, and the intersection of AI with database operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Contact
 &lt;div id="contact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#contact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/liuzhilong62" target="_blank" rel="noreferrer"&gt;liuzhilong62&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;X (Twitter): &lt;a href="https://x.com/liuzhilong62" target="_blank" rel="noreferrer"&gt;@liuzhilong62&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Email: &lt;a href="mailto:liuzhilong62@outlook.com" &gt;liuzhilong62@outlook.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All content is licensed under &lt;a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank" rel="noreferrer"&gt;CC BY-NC-SA 4.0&lt;/a&gt;.&lt;/p&gt;</description><content:encoded>
&lt;h2 class="relative group"&gt;The Last DBA
 &lt;div id="the-last-dba" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#the-last-dba" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h2&gt;
&lt;p&gt;Hi, I&amp;rsquo;m Zhilong Liu — a PostgreSQL DBA based in China.&lt;/p&gt;
&lt;p&gt;This blog is where I document my deep dives into PostgreSQL internals, production incident analysis, source code walkthroughs, and paper reviews. I write primarily in Chinese on &lt;a href="https://lastdba.com" target="_blank" rel="noreferrer"&gt;lastdba.com&lt;/a&gt;, and I&amp;rsquo;m building this English section to share key insights with the global PostgreSQL community.&lt;/p&gt;

&lt;h3 class="relative group"&gt;What I Write About
 &lt;div id="what-i-write-about" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#what-i-write-about" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Case Studies&lt;/strong&gt; — Real production incidents and how they were resolved&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Internals&lt;/strong&gt; — PostgreSQL mechanisms explained from first principles&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Source Code&lt;/strong&gt; — Deep dives into specific subsystems (vacuum, locking, WAL, planner)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Paper Reviews&lt;/strong&gt; — Academic papers on databases, interpreted for practitioners&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI &amp;amp; Databases&lt;/strong&gt; — AIOps, MCP, and the intersection of AI with database operations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 class="relative group"&gt;Contact
 &lt;div id="contact" class="anchor"&gt;&lt;/div&gt;
 
 &lt;span
 class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"&gt;
 &lt;a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#contact" aria-label="Anchor"&gt;#&lt;/a&gt;
 &lt;/span&gt;
 
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/liuzhilong62" target="_blank" rel="noreferrer"&gt;liuzhilong62&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;X (Twitter): &lt;a href="https://x.com/liuzhilong62" target="_blank" rel="noreferrer"&gt;@liuzhilong62&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Email: &lt;a href="mailto:liuzhilong62@outlook.com" &gt;liuzhilong62@outlook.com&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All content is licensed under &lt;a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" target="_blank" rel="noreferrer"&gt;CC BY-NC-SA 4.0&lt;/a&gt;.&lt;/p&gt;</content:encoded></item></channel></rss>